ppdebreuck / modnet

MODNet: a framework for machine learning materials properties
MIT License
81 stars 34 forks source link

FitGenetic stuck around the end of iterations #156

Closed gbrunin closed 1 year ago

gbrunin commented 1 year ago

When using FitGenetic, I sometimes have the following issue: the process stops around the end of an iteration, without throwing an error, and without stopping to run. It's not reproducible, but happens quite often in my case. I run the code on my laptop, in a main() method. When it happens, I simply re-run the code and it works (or not, depends on the phase of the moon). Pierre-Paul suspects a problem with parallelization, more precisely a child process that never returns its result. Any clue @ml-evs ?

ml-evs commented 1 year ago

Hopefully closed by setting lots of TF_* env vars so that TF does not hog all of a CPU at once

gbrunin commented 1 year ago

Yep, seems to work. Thanks! For anyone having a similar issue, try setting the environment variables: export OPENBLAS_NUM_THREADS=1 export MKL_NUM_THREADS=1 export OMP_NUM_THREADS=1 export TF_NUM_INTRAOP_THREADS=1 export TF_NUM_INTEROP_THREADS=1 export CUDA_VISIBLE_DEVICES=1

ml-evs commented 1 year ago

export CUDA_VISIBLE_DEVICES=1

I think this should be -1? Probably this is equivalent unless you have more than 1 GPU on your system...

ppdebreuck commented 1 year ago

Yes ! No need of a cuda device