Closed VeritasJoker closed 1 month ago
Please also paste the submit.sh script along with the log file. Thanks.
Thanks & Regards, Harsha Gazula
On Fri, Feb 24, 2023 at 12:30 AM KenWWW @.***> wrote:
It seems like this line is causing some problems when submitting jobs to slurm. Forgot to include the error before deleting all the logs. Will try to reproduce again
— Reply to this email directly, view it on GitHub https://github.com/hassonlab/247-encoding/issues/71, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHXDPZUILPT3FDMED5XSKDWZBBQFANCNFSM6AAAAAAVGPRFX4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I think I noticed something similar. Didn't find the root cause, I moved away from using multiprocessing and I do sequential within a job and used job arrays for parallelization.
I tried but couldn't replicate the same error message. For now, I just replace this with p = Pool(4)
and it works fine.
I think I now know what this issue is. I found out by doing my own work :). cpu_count()
returns the "total number of cpus on the node" and not the number of "cpus requested".
Replacing cpu_count()
with len(os.sched_getaffinity(0))
should do.
I think I now know what this issue is. I found out by doing my own work :).
cpu_count()
returns the "total number of cpus on the node" and not the number of "cpus requested".
lol that's what I suspected but never confirmed it. No matter how many memory I require for a job it always runs out : )
@VeritasJoker Did you test this change?
@VeritasJoker Did you test this change?
I trust you : )
@VeritasJoker Did you test this change?
I trust you : )
the number and type of issues we have had so far should tell you that you shouldn't 😛
It seems like this line is causing some problems when submitting jobs to slurm. Forgot to include the error before deleting all the logs. Will try to reproduce again
https://github.com/hassonlab/247-encoding/blob/82042c0467e78dac671e070c0bc069b44894215c/scripts/tfsenc_main.py#L199