mikeyEcology / MLWIC2

Classify camera trap images using machine learning with R Shiny Apps
36 stars 17 forks source link

How to limit the number of cores used for training #6

Open hannaboe opened 4 years ago

hannaboe commented 4 years ago

I'm training a model on a supercomputer and since train() is using all cores I would like to limit the number of cores used. I tried to set num_cores = 20 but that doesn't change anything and I still use all cores. Is there another way to limit the number of cores used when running train?

mikeyEcology commented 4 years ago

If you set num_cores =1 it will limit the number of cores used to 1.

hannaboe commented 4 years ago

I set num_cores = 1 but it still uses all 72 cores.

mikeyEcology commented 4 years ago

It could be that your HPC is treating these commands differently, because HPCs run a little differently than standard computers. Are you using a GPU? If so, how many. Also, if you want to run non-interactively on your HPC, you could set print_cmd=TRUE, and then the function will provide a command you can submit as a job in your HPC; this is usually a good idea for longer runs. Using this you could set up a job with a specific number of threads (cores), which would limit how many the function is running.

JoejynWan commented 4 years ago

Hi @mikeyEcology I am facing a similar issue with not being able to limit the number of cores and switching to GPU. I am running train() on a computer with AMD Ryzen 7 3700X (8-core, 16 threads) and NVIDIA GeForce RTX 2080 Super. I have tried a combination of num_cores = 1, num_cores = 10 and num_gpus = 1 and num_gpus = 2. I have also tried both methods of running it through R and setting print_cmd = T and submitted the job via terminal. In all cases, train() still uses all 16 cores and runs on CPU instead of GPU. Am I missing some inputs? Thank you!

mikeyEcology commented 4 years ago

The issue here is the release of tensorflow; @hannaboe this will probably help your problem as well. The installation of tensorflow is different if you are using a gpu, so when you install it, you should use pip install tensorflow-gpu==1.14. Running this will overwrite the installation that does not use the gpu. More details can be found here. Note, from this link, that the version of tensorflow you install (in this example 1.14) will depend on the type of driver you have for your gpu. I'll add an explanation about this to the readme file.