nuclear-multimessenger-astronomy / nmma

A pythonic library for probing nuclear physics and cosmology with multimessenger analysis
https://nuclear-multimessenger-astronomy.github.io/nmma/
GNU General Public License v3.0
30 stars 58 forks source link

Unpickling error Bu2022Ye.pkl (invalid load key "\x0a") #251

Closed haukekoehn closed 11 months ago

haukekoehn commented 11 months ago

Describe the bug When loading the svd models from Bu2022Ye.pkl, the unpickling fails because and returns a warning: pickle.UnpicklingERror: invalid loadkye, '\x0a'. This happens when different protocols are used for pickling and unpickling.

To Reproduce Start a multimessenger analysis with GW, KN and GRB. NMMA current commit.

mcoughlin commented 11 months ago

@haukekoehn I suspect this a python version issue. Are you on 3.11 perhaps?

haukekoehn commented 11 months ago

@mcoughlin no python 3.8

mcoughlin commented 11 months ago

@haukekoehn does moving to 3.9 make any difference?

haukekoehn commented 11 months ago

no

mcoughlin commented 11 months ago

@sahiljhawar @bfhealy can anyone reproduce?

sahiljhawar commented 11 months ago

@mcoughlin I am unaware of GW+ analyses. Can't help with this.

However, @haukekoehn can you check the pickle file by loading it and checking its keys? \x0a is an LF character, if it helps.

mcoughlin commented 11 months ago

@sahiljhawar i was more asking you to confirm if you can load the file.

sahiljhawar commented 11 months ago

@mcoughlin Ahh okay. Let me check.

sahiljhawar commented 11 months ago

If Bu2022Ye.pkl is same as Bu2022Ye_tf.pkl from Zenodo: 8164628; then yes, I am able to load it using numpy

mcoughlin commented 11 months ago

@haukekoehn Can you download that file separately and check the same?

haukekoehn commented 11 months ago

I am currently trying to see whether it works on another machine.

But yes, downloading Bu2022Ye_tf.pkl and reading it in works.

So i wanna start a GW+KN+GRB inference from scratch and only provide an empty directory for the svd models. Then it does its thing and starts downloading Bu2022Ye.pkl (no tf!). I just looked into this file and it actually looks like an html file, not a binary file.

mcoughlin commented 11 months ago

Ok so that means maybe the download is failing... maybe lack of Zenodo access for that machine? What's the error?

sahiljhawar commented 11 months ago

@haukekoehn In that case you can get the model from /home/enlil/jhawar/svdmodels and start your run with local files

haukekoehn commented 11 months ago

alright, thanks, i will try!

haukekoehn commented 11 months ago

On the other machine it fails even earlier and raises a ValueError model_name Bu2022Ye_tf not found in models list

sahiljhawar commented 11 months ago

I think you need models.yaml file as well

haukekoehn commented 11 months ago

I know managed to get it running (or at least initializing the live points :D), but interestingly enough, it only downloaded the models when it was run with a single process, i.e. only nmma-analysis, not srun nmma-analysis. Then the models appeared in the directory and i could restart with srun - n 1024 nmma-analysis.

mcoughlin commented 11 months ago

@tsunhopang maybe we need a line to check whether the model is there and indicate to first run the download if it isn't and then the user can rerun?

sahiljhawar commented 11 months ago

252

@mcoughlin @bfhealy Part of the problem arises from here https://github.com/nuclear-multimessenger-astronomy/nmma/blob/4ba91655a509bce6286574d3c5ad4b764788686c/nmma/em/model.py#L251

Even the tests with Bu2022Ye fails (here)

mcoughlin commented 11 months ago

@sahiljhawar @bfhealy Maybe it's time to deprecate that pathway?

sahiljhawar commented 11 months ago

So now, we just need {model-name}.pkl and {model-name_filter}.pkl or {model-name_filter}.h5, right?

bfhealy commented 11 months ago

@sahiljhawar That's correct for files on Zenodo! When they get downloaded via NMMA code, the filter-specific files get renamed to {filter}.pkl or {filter}.h5 and put in a model-specific directory.

sahiljhawar commented 11 months ago

@bfhealy For that, in one of the previous meetings it was finalised that a mirror of Zenodo models will be stored on Potsdam server for faster and easier acess. This way the directory structure can be preserved

sahiljhawar commented 11 months ago

But in any case, the support for *_mag.pkl and *_lbol.pkl should be removed, since we do not have those files anymore

sahiljhawar commented 11 months ago

Also does the model.pkl and filter.h5 (stored inside model folder) serve different purpose? Or do they contain same data with different file structure?

bfhealy commented 11 months ago

They serve different purposes - each model.pkl file contains basic SVD information while the filter.h5 files contain the trained neural network weights/biases mapping merger parameters to light curve eigenvalues (in tensorflow format).