Problems with multiprocessing

hassonlab / 247-encoding

Contains python scripts for performing encoding on 247 data.

0 stars 9 forks source link

Problems with multiprocessing #71

Closed VeritasJoker closed 1 month ago

VeritasJoker commented 1 year ago

It seems like this line is causing some problems when submitting jobs to slurm. Forgot to include the error before deleting all the logs. Will try to reproduce again

https://github.com/hassonlab/247-encoding/blob/82042c0467e78dac671e070c0bc069b44894215c/scripts/tfsenc_main.py#L199

hvgazula commented 1 year ago

Please also paste the submit.sh script along with the log file. Thanks.

Thanks & Regards, Harsha Gazula

On Fri, Feb 24, 2023 at 12:30 AM KenWWW @.***> wrote:

It seems like this line is causing some problems when submitting jobs to slurm. Forgot to include the error before deleting all the logs. Will try to reproduce again

https://github.com/hassonlab/247-encoding/blob/82042c0467e78dac671e070c0bc069b44894215c/scripts/tfsenc_main.py#L199

— Reply to this email directly, view it on GitHub https://github.com/hassonlab/247-encoding/issues/71, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHXDPZUILPT3FDMED5XSKDWZBBQFANCNFSM6AAAAAAVGPRFX4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zkokaja commented 1 year ago

I think I noticed something similar. Didn't find the root cause, I moved away from using multiprocessing and I do sequential within a job and used job arrays for parallelization.

VeritasJoker commented 1 year ago

I tried but couldn't replicate the same error message. For now, I just replace this with p = Pool(4) and it works fine.

hvgazula commented 1 year ago

I think I now know what this issue is. I found out by doing my own work :). cpu_count() returns the "total number of cpus on the node" and not the number of "cpus requested".

hvgazula commented 1 year ago

Replacing cpu_count() with len(os.sched_getaffinity(0)) should do.

VeritasJoker commented 1 year ago

I think I now know what this issue is. I found out by doing my own work :). cpu_count() returns the "total number of cpus on the node" and not the number of "cpus requested".

lol that's what I suspected but never confirmed it. No matter how many memory I require for a job it always runs out : )

hvgazula commented 1 year ago

@VeritasJoker Did you test this change?

VeritasJoker commented 1 year ago

@VeritasJoker Did you test this change?

I trust you : )

hvgazula commented 1 year ago

@VeritasJoker Did you test this change?

I trust you : )

the number and type of issues we have had so far should tell you that you shouldn't 😛