Closed smerchkz closed 11 months ago
my problem is that I can’t achieve 90%-100% complete utilization of the video card, in one node process i utilizate 20%, not more
when you're running with a single process, gpu never gets saturated - processing on gpu is small compared to getting data in and out of gpu. so your gpu utilization seems low at 10%.
but when you run with multiprocessing, then its a question of tensorflow-gpu global locks - it only allows one process to be active on gpu and even if that is just 10% of overall time, that means another process will wait for 90% of time for lock to be released and that wait shows as load as its happening deep inside the library.
with tensorflow-cpu, there is no issue since each process hits different core without much locking.
i don't see a clean way to do this since tfjs-node-gpu does not expose deep internals of tensorflow-gpu so app could in theory perform busy checks on its own.
i really don't see a clean solution here.
today i find solution allow use multiprocess on GPU, in pm2 in ecosytem.conf i add constant
TF_FORCE_GPU_ALLOW_GROWTH: true
My 1080ti has 11GB, i start 4 proceses in cluster and each process take about ~2GB videocard RAM , and summary 4 processes gave ~90% load on the GPU. I see this in watch -n 1 nvidia-smi
for default node-gpu use TF_FORCE_GPU_ALLOW_GROWTH=false, and first process take all videocard RAM, for second process there is no longer enough memory, so GPU goes crazy and does not give quick calculations, both processes were fighting for the same memory.
module.exports = { apps: [ { name: ".backend.node", script: "app.js", watch: true, ignore_watch: ["node_modules"], watch_options: { followSymlinks: false, }, autorestart: true, //max_memory_restart: "2G", //for CPU max_memory_restart: "4G", //for GPU wait_ready: true, force: false, env: { PORT: 8081, HOSTNAME: "127.0.0.1", TF_FORCE_GPU_ALLOW_GROWTH: true, }, //exec_mode: "fork", //or exec_mode: "cluster", instances: 4, //or //instances: "max", NODE_ENV: "production", }, ],
I want to share some inside information, for CPUs the best benchmarks are obtained with the setting instances: "max"
ah, brilliant! i thought that tensorflow-gpu locking was actually locking, but it was just memory contention. thanks for sharing!
Hi, Vlad
thanks for library
is it possible node multipocess using gpu? i take you code https://github.com/vladmandic/face-api/blob/master/demo/node-multiprocess.js and paste
instead of your lines
then i use
it works perfect my 1080Ti GPU load on 5-10%
but if i use
videcard load 100%, it seems like it's going crazy.
about the same result, then i not use workers as in you code, i use alone nodejs server and demonize it on pm2 with config
maybe need write some env vars for tf, then GPU allow normal work with many node processes? can you help me to highload on GPU?