thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Why don't we always set --gpu 1.0? #1164

Open mfaramarzi opened 4 years ago

mfaramarzi commented 4 years ago

What exactly do --gpu? in the default it say "how much gpu". Does it mean what portion of computations be assigned to gpu rather than cpu? if so, why dont we assign all computations only to the gpu? tnx

arunasank commented 4 years ago

It's the GPU device ID. If you have 8 GPUs, the device IDs are generally between 0 and 7 (inclusive). So, the --gpu specifies which GPU to use if you have more than one GPU.

EDIT: I am wrong. Please look at https://github.com/thtrieu/darkflow/issues/98.

KausthubK commented 4 years ago

Disclaimer: this is my best guess based off what I've experienced... I'm not 100% sure so if someone DOES know please pipe up :)

My understanding is that the value guides the GPU VRAM allocation (I.e. how much of your gpu ram it consumes before trying to load and unload it elsewhere like a swap space). The more you can use it fully in your GPU memory the closer you'll get to training the network quickly since it's faster than reading weights from elsewhere.

Naturally everyone has a limit to the amount of vRAM so if you set it to 1.0 (I.e. 100%) but 5% is being used to run your monitors and displays naturally it'll overlap and weird memory errors occur.