Open Saladino93 opened 4 months ago
Hi - I'm assuming you're running the 'dev' branch? If so, this would probably be due to me trying to save memory, which didn't work out (caused more problems than it solved), and so I fixed this at the weekend. So I think if you just pull from 'dev', this should go away. Please let me know if not.
I installed through pip. Let me see if using the 'dev' branch improves the situation. Thanks.
Ok - it's unlikely to be what I said then, but I'm not sure what the issue would be without more info. Maybe you could post the whole traceback?
Indeed I ran without the saving model hack (that converts to a np.float16). I am running now with it and waiting for the results.
This is what I get from my previous pip installation:
19: Traceback (most recent call last):
19: File "/global/homes/o/omard/.conda/envs/act/bin/nemoModel", line 240, in <module>
19: comm.send(modelImage, dest = 0)
19: File "mpi4py/MPI/Comm.pyx", line 1406, in mpi4py.MPI.Comm.send
19: File "mpi4py/MPI/msgpickle.pxi", line 211, in mpi4py.MPI.PyMPI_send
19: File "mpi4py/MPI/msgpickle.pxi", line 147, in mpi4py.MPI.pickle_dump
19: File "mpi4py/MPI/msgbuffer.pxi", line 50, in mpi4py.MPI.downcast
19: OverflowError: integer 3566595060 does not fit in 'int'
(note that I clone the mpi4py environment of Perlmutter)
Ok, I actually manage to run by doing
print("Saving memory by converting to float16 before applying pixel window function...")
modelMap=np.float16(modelMap) #NOTE: this is a bit of a hack to save memory
The total file size is 3.2 GB. Does this make sense to you?
I am not sure if this is due to some limitation on Perlmutter (doubt it), mpi4py, or something else (perhaps I ran my initial PS search wrongly...).
That's a mystery to me, because I've taken that out as I mentioned above. I don't think I've managed to get the OverflowError you've been getting, running on the sims I've been making or the real data.
Hi all. I am a new user running on Perlmutter.
On running
srun -u -l -n 64 nemoModel "/pscratch/sd/o/omard/FGSIMS_OUT/agora/${nemo_run}/${nemo_run}_optimalCatalog.fits" $mask $beam "/pscratch/sd/o/omard/FGSIMS_OUT/agora/${nemo_run}/nemomodel_${freq}_snr4.fits" --min-snr 4.0 --freq $freq -M -n"
(note I added by hand the min-snr argument)
I get
even if
Any ideas how to debug this? I thought it might be related to my survey mask, but I still keep getting this even after reducing the area.
Thanks in advance.