Closed GMFranceschini closed 7 months ago
Hum. I'm guessing it might due to some numpy autobroadcasting issue. Could you try to replace that line of code in Higashi_wrapper.py
from to_neighs = np.array(to_neighs)[:-1]
to to_neighs = np.array(to_neighs, dtype='object')[:-1]
and see if that will fix the error? If so I'll update the repo accordingly. Thanks!
And regarding the gpu error. Thanks for letting me know. I think it's likely because I use nvidia-smi in command line to get the gpu with the largest available gpu mem. For machines on slurm, it's possible that even if you are assigned with gpu:0, nvidia-smi still returns other available gpus, and they happen to have slightly more gpu memory. I'll think of something to fix it.
Thank you! I confirm that the fix worked. Now training is running, and I am no longer observing the error.
Also, you are right about the GPU; that is exactly what is happening in our case, as we have 2 GPU nodes together. Unfortunately, I am encountering some problems in training on the Slurm cluster via GPU; namely, the step of training for imputation gets stuck at never progresses. I will investigate this further, as everything works locally on my GPU.
Thanks for the confirmation!
Dear developers, I encounter this error when running
higashi_model.train_for_imputation_nbr_0()
.Do you have any clues about what might be the problem? Maybe my version of Numpy is too recent? Any advice is appreciated.
My script looks like this.
Here is my conda env:
higashi.txt
Unrelated but maybe useful for you: I had to force the cuda device to be zero in
get_free_gpu()
to be used in a Slurm cluster. It oddly kept switching to GPU 1 when only a GPU was requested (GPU 0) - blocking execution.