zanglab / SICER2

MIT License
20 stars 15 forks source link

Writing to /tmp on HPC Clusters #11

Closed rpep closed 6 months ago

rpep commented 3 years ago

Hi there,

I just wanted to make you aware of an issue that can occur because of the way that you are writing working output using the tempdir Python library. We recently had a user of the HPC cluster here at the University of Birmingham raise an issue because their compute jobs which utilised SICER2 were not producing any output. After some investigation, we found that the jobs were consuming no CPU resources, and it was because they had exceeded the allowed capacity of the /tmp storage area. It appears that rather than failing/crashing/raising an exception, the code just hangs in this instance.

We found a workaround by advising the user to set the environment variable TMPDIR to a location in which they had enough capacity, which overrides the default location of /tmp, but it may be worth warning about this in the documentation, as this sort of setup is fairly common on clusters which have limited on-node storage.

Best wishes, Dr Ryan Pepper

JasonJiangs commented 7 months ago

Hi there,

I just wanted to make you aware of an issue that can occur because of the way that you are writing working output using the tempdir Python library. We recently had a user of the HPC cluster here at the University of Birmingham raise an issue because their compute jobs which utilised SICER2 were not producing any output. After some investigation, we found that the jobs were consuming no CPU resources, and it was because they had exceeded the allowed capacity of the /tmp storage area. It appears that rather than failing/crashing/raising an exception, the code just hangs in this instance.

We found a workaround by advising the user to set the environment variable TMPDIR to a location in which they had enough capacity, which overrides the default location of /tmp, but it may be worth warning about this in the documentation, as this sort of setup is fairly common on clusters which have limited on-node storage.

Best wishes, Dr Ryan Pepper

Hi Dr Pepper, thanks for the reported issue! We will include your suggestion in our latest version.

rpep commented 7 months ago

@JasonJiangs Great, thanks!