ra1nty / DXcam

A Python high-performance screen capture library for Windows using Desktop Duplication API
MIT License
529 stars 72 forks source link

Optimize the performance of partial screenshots #35

Open xyk2000 opened 2 years ago

xyk2000 commented 2 years ago

Deferring the format conversion can significantly improve the performance of partial screenshots, reducing the process time taken from 6.16 seconds to 3.6 seconds in my case

QQ截图20221003202823 QQ截图20221003202833

AI-M-BOT commented 2 years ago

image I did see minor improvement in FPS, tested with *_640640_ region However, mss can achieve similar FPS with much lower CPU usage**


It was originally built as a part of deep learning pipeline for FPS games to perform better than existed python solutions (python-mss, D3DShot).

Quote from readme

I tested 640*640 partial screenshots with mss, DXcam, d3dshot and c++ version of dxgi screenshot, all with that ufo test rendering on.

Device[0]:<Device Name:Intel(R) UHD Graphics Dedicated VRAM:128Mb VendorId:32902>
NVIDIA GeForce GTX 1660 Ti with Max-Q Design
AI-M-BOT commented 2 years ago

ctypes.string_at and color conversion cost most of the time Consider that this lib is written totally in python, i will assume this is the best approach for dxgi screenshot If want to achieve higher performance, might need to rewrite stuff in C++ or Rust if possible as python module

ra1nty commented 2 years ago

image I did see minor improvement in FPS, tested with *_640640_ region However, mss can achieve similar FPS with much lower CPU usage**

It was originally built as a part of deep learning pipeline for FPS games to perform better than existed python solutions (python-mss, D3DShot).

Quote from readme

I tested 640*640 partial screenshots with mss, DXcam, d3dshot and c++ version of dxgi screenshot, all with that ufo test rendering on.

Device[0]:<Device Name:Intel(R) UHD Graphics Dedicated VRAM:128Mb VendorId:32902>
NVIDIA GeForce GTX 1660 Ti with Max-Q Design
  • C++ version of dxgi screenshot always wins since it only use less than 1% of CPU and can achieve 500 FPS
  • Mss can achieve my screen refresh rate (144hz) with less than 5% CPU usage
  • DXcam can achieve couple more FPS value than my screen refresh rate (150 tops), however it uses a bit more than 20% of CPU
  • D3DShot has similar CPU usage but can never achieve more than 70 FPS on my PC

So, as a conclusion, currently DXcam has a huge improvement compare with d3dshot, but still not good enough to use in game. Hopefully it will be better

Thanks for trying dxcam out!

  1. Yes C++ will definitely be faster. However dont know how you get 500fps with ufo test since it's synced to your monitor refresh rate (144hz), thus only 144 frame per sec will be rendered.
  2. I am aware of the high CPU usage, haven't find a way to workaround it. pr or suggestions will be welcome.
  3. For partial screenshot I haven't test what @xyk2000 pointed out. That might be a possible improvement
  4. I have seen numerous usage of dxcam for video games, but as you said, hardware may plays a factor. Make sure you select your 1660 but not your integrated GPU. I only tested on rtx 2000 and 3000 series with a relative high-end CPU and I was able to bound my pipeline latency to 4-5ms (200-250fps) using tensorrt and properly implemented other stacks. Usage is for a neural netowrk aimbot as well.
AI-M-BOT commented 2 years ago

image I did see minor improvement in FPS, tested with *_640640_ region However, mss can achieve similar FPS with much lower CPU usage**

It was originally built as a part of deep learning pipeline for FPS games to perform better than existed python solutions (python-mss, D3DShot).

Quote from readme I tested 640*640 partial screenshots with mss, DXcam, d3dshot and c++ version of dxgi screenshot, all with that ufo test rendering on.

Device[0]:<Device Name:Intel(R) UHD Graphics Dedicated VRAM:128Mb VendorId:32902>
NVIDIA GeForce GTX 1660 Ti with Max-Q Design
  • C++ version of dxgi screenshot always wins since it only use less than 1% of CPU and can achieve 500 FPS
  • Mss can achieve my screen refresh rate (144hz) with less than 5% CPU usage
  • DXcam can achieve couple more FPS value than my screen refresh rate (150 tops), however it uses a bit more than 20% of CPU
  • D3DShot has similar CPU usage but can never achieve more than 70 FPS on my PC

So, as a conclusion, currently DXcam has a huge improvement compare with d3dshot, but still not good enough to use in game. Hopefully it will be better

Thanks for trying dxcam out!

  1. Yes C++ will definitely be faster. However dont know how you get 500fps with ufo test since it's synced to your monitor refresh rate (144hz), thus only 144 frame per sec will be rendered.
  2. I am aware of the high CPU usage, haven't find a way to workaround it. pr or suggestions will be welcome.
  3. For partial screenshot I haven't test what @xyk2000 pointed out. That might be a possible improvement
  4. I have seen numerous usage of dxcam for video games, but as you said, hardware may plays a factor. Make sure you select your 1660 but not your integrated GPU. I only tested on rtx 2000 and 3000 series with a relative high-end CPU and I was able to bound my pipeline latency to 4-5ms (200-250fps) using tensorrt and properly implemented other stacks. Usage is for a neural netowrk aimbot as well.

1, I tested with my mouse move very fast, yeah if you drag your curser and move like hell on desktop, FPS value will reach a high average which i don't know why, I guess monitor for example has 144hz refresh level is its hardware limit, however in graphic memory level there is no such limit and dxgi just copy from memory??? 2, Avoid unnecessary memory copy/move??? Try my compiled module? 3, I dont know 4, Did you run with a game and get 200+ fps? That is impressive

ra1nty commented 2 years ago

image I did see minor improvement in FPS, tested with *_640640_ region However, mss can achieve similar FPS with much lower CPU usage**

It was originally built as a part of deep learning pipeline for FPS games to perform better than existed python solutions (python-mss, D3DShot).

Quote from readme I tested 640*640 partial screenshots with mss, DXcam, d3dshot and c++ version of dxgi screenshot, all with that ufo test rendering on.

Device[0]:<Device Name:Intel(R) UHD Graphics Dedicated VRAM:128Mb VendorId:32902>
NVIDIA GeForce GTX 1660 Ti with Max-Q Design
  • C++ version of dxgi screenshot always wins since it only use less than 1% of CPU and can achieve 500 FPS
  • Mss can achieve my screen refresh rate (144hz) with less than 5% CPU usage
  • DXcam can achieve couple more FPS value than my screen refresh rate (150 tops), however it uses a bit more than 20% of CPU
  • D3DShot has similar CPU usage but can never achieve more than 70 FPS on my PC

So, as a conclusion, currently DXcam has a huge improvement compare with d3dshot, but still not good enough to use in game. Hopefully it will be better

Thanks for trying dxcam out!

  1. Yes C++ will definitely be faster. However dont know how you get 500fps with ufo test since it's synced to your monitor refresh rate (144hz), thus only 144 frame per sec will be rendered.
  2. I am aware of the high CPU usage, haven't find a way to workaround it. pr or suggestions will be welcome.
  3. For partial screenshot I haven't test what @xyk2000 pointed out. That might be a possible improvement
  4. I have seen numerous usage of dxcam for video games, but as you said, hardware may plays a factor. Make sure you select your 1660 but not your integrated GPU. I only tested on rtx 2000 and 3000 series with a relative high-end CPU and I was able to bound my pipeline latency to 4-5ms (200-250fps) using tensorrt and properly implemented other stacks. Usage is for a neural netowrk aimbot as well.

1, I tested with my mouse move very fast, yeah if you drag your curser and move like hell on desktop, FPS value will reach a high average which i don't know why, I guess monitor for example has 144hz refresh level is its hardware limit, however in graphic memory level there is no such limit and dxgi just copy from memory??? 2, Avoid unnecessary memory copy/move??? Try my compiled module? 3, I dont know 4, Did you run with a game and get 200+ fps? That is impressive

  1. Idk if you are able to understand this: mss uses bitblt and the problem with your test methodology is cursor is drawn by OS and is not in HBITMAP in HDC.
  2. Not real suggestions
  3. .
  4. Yes. Screenshot speed is usually not a bottleneck in such applications. The model latency is.

Plus, u need to use ur dedicated gpu but not the integrated one.