tfaehse / DashcamCleaner

Censor identifiable information in videos, in particular dashcam recordings in Germany.
GNU Affero General Public License v3.0
130 stars 27 forks source link

GPU not fully utilized #52

Closed Dave04O4 closed 1 year ago

Dave04O4 commented 1 year ago

Hello, I use my GPU and it fits up to the first 10%. After that it drops to a permanent 10-20%.

I once had a version from July/June where the GPU was fully utilized, but somehow not in the current version.

I have already tested with the batch size, but was not really successful 2022-10-03 (1)

tfaehse commented 1 year ago

Hi!

This took me quite a while to even look at, apologies for that. The answer isn't very straightforward, but I can try:

Generally, you want to choose the biggest batch size your GPU memory allows for to optimise inference time. But you seem to have done that already.... do note that the 1080p_medium weights result in a pretty massive network that takes a lot of memory and performance. If your GPU can't fit more than one image per batch, you might want to reduce the inference size and/or choose a smaller model, e.g. 720p_small_mosaic at 720p.

There are a few things in the pipeline to improve this though!

Dave04O4 commented 1 year ago

Hi, Thanks for the feedback. I try everything possible and also the other versions of them.

I had a version in June/July/August where a 1-minute clip took about 3 minutes and now it's about 10 minutes with the latest version.

With so much different hardware, it's sometimes really difficult to keep things running well.

Thanks for the great tool.

Intel Core i7-9700K ASUS ROG Strix GeForce RTX 2080 OC

tfaehse commented 1 year ago

I'd assume that's due to the blurring. The new version (currently being developed here: https://github.com/tfaehse/DashcamCleaner/tree/feature/yolov8) addresses this somewhat:

With this version, the workflow for users is a bit more simple:

  1. choose the weights file (maybe try 720p_small_v8 to see what that would look like)
  2. choose the batch size (for best performance: as big as possible without gettig CUDA out of memory errors)
  3. choose the amount of blurring workers (also as large as possible, but on Windows you run into memory errors very quickly)

For the second step I want to look into how to automate that, but that's for another day.