ryanhausen / fitsmap

FitsMap: A Simple, Lightweight Tool For Displaying Interactive Astronomical Image and Catalog Data
MIT License
48 stars 8 forks source link

Don't raise error if ray already initialized. #77

Open bd-j opened 1 year ago

bd-j commented 1 year ago

I was having problems running fitsmap via convert.dir_to_map on a certain cluster using slurm. A little digging suggests that on clusters where slurm does not provide exclusive node access ray still attempts to use all cores on the node, leading to errors.

The errors can be avoided by initializing ray with only a single cpu (I haven't checked if it works using the number of cpus requested via slurm) before calling fitsmap, but only if ray is then re-initialized within fitsmap with ignore_reinit_error=True.

I'm not sure if this is the best way to address the issue, but thought I'd provide the fix in case it's helpful. Happy to close this and just raise an issue or rework this PR if you have suggestions.

ryanhausen commented 10 months ago

Hi @bd-j, thanks for the PR I will take a look. I haven't done a lot of testing on slurm, so there could be an issue with how ray get's initialized. How do you make the call to fitsmap, is it via sbatch/srun or manually in an interactive session?

Sorry for the delayed reply. I seem to have somehow unsubscribed from notifications on this repo.

bd-j commented 10 months ago

Hi @ryanhausen, it was sbatch, but not requesting an entire node (actually only requesting a single cpu). I added the following to the top of the script that called fitsmap.convert

import ray
ray.init(ignore_reinit_error=True, num_cpus=1)

I didn't test replacing num_cpus=1 with something like $SLURM_NTASKS for 1 < $SLURM_NTASKS < $SLURM_CPUS_ON_NODE

ryanhausen commented 10 months ago

@bd-j thanks. I need to read some into how best to use ray and slurm. I want to make sure I don't implement things in a way that breaks other things. Thanks!