Open clark2668 opened 2 months ago
I think this is purely an issue with radiotools. Installing the version from Github instead of PyPi actually solves this issue (ie. doing pip install git+https://github.com/nu-radio/radiotools
). I have not encountered this issue when using any NuRadio modules so far.
Maybe as a quick fix we could limit the Numpy version to 1.x, but in the long term it would be useful to either use the git version in the TOML file or ask the radiotools devs to release a new version on the PyPi.
Sounds like an relatively easy fix for @cg-laser who usually does the radiotools releases!
Indeed, it looks like it was fixed a couple weeks ago! (https://github.com/nu-radio/radiotools/commit/eaeed2dd858793107cc906bc59d435ad9326d164). I think radiotools probably just needs a new pypi release? I'll close this request then! With another tag to @cg-laser :)
I suspect this will cause problems wherever numpy arrays are pickled as well (which, seems like it happens a lot in .nur files?).
As I understand, if someone produces a .nur file with numpy 2.x, it won't be readable in numpy < 1.26 due some some internals changing in numpy arrays. Doing the opposite I believe will jwilll cause a deprecation warning, which is not as serious, though eventually numpy 2.x will probably remove the compatibility shim (though they are providing https://github.com/numpy/numpy/pull/24866 as a solution).
I suspect in the near term, different people will start using numpy 1.x and 2.x (since numpy is not pinned to either side) and interchanging .nur files could become more annoying until everybody is on 2.x
I can confirm that pickling a numpy array with numpy 2.x is unreadable on on numpy < 1.26 (it does work on 1.26). I haven't directly tested a .nur file but as numpy arrays are pickled, it should certainly cause problems across the 2.0 boundary.
I did not observe a deprecation warning going the other way though, so that's good?
For now probably the solution is to make sure anybody producing .nur files to be consumed by other people to pin numpy < 2 ? Requiring numpy >=1.26 (or >=2) likely to be more difficult, but probably something that should be done eventually.
(actually probably needs to be >= 1.26.1 if I'm reading the release notes correctly)
Hi! Thanks for taking a look at this Cosmin. I think it's worth reopening this discussion in light of that, I hadn't thought about it from that perspective at all... Pinning numpy < 2 won't work, eventually python will drop that. But if the data formats are not forward compatible, then I don't see how we won't run into this issue again down the road. Is there any way to make pkl more futureproof?
Hi! Thanks for taking a look at this Cosmin. I think it's worth reopening this discussion in light of that, I hadn't thought about it from that perspective at all... Pinning numpy < 2 won't work, eventually python will drop that. But if the data formats are not forward compatible, then I don't see how we won't run into this issue again down the road. Is there any way to make pkl more futureproof?
yes, one option would be to convert every numpy array to a list before dumping it into a pickle file. The implementation of this should be somewhat straight forward, it will go to the cost of some efficiency I assume
(actually probably needs to be >= 1.26.1 if I'm reading the release notes correctly)
For me it did work with 1.26.0
. But I can confirm that I could not read 2.0 files with 1.26 interpreter (the other direction worked!)
For me personal that is okay, as long as we can read old files I am happy that new files can only be read with newer version of the libraries. But I agree it is not ideal.
Thanks for flagging this! I agree this is a problem - we do want to at least guarantee backwards compatibility with NuRadioMC, I think, which will break if the same version of NuRadioMC can use either version of numpy (as an aside, for other reasons (#628 ) we should anyway tag the numpy version used for simulations - this is another open issue). My suggestion would be
While (1.) might not be foolproof (I'm not sure how pip will resolve pip install nuradiomc
for users that already have numpy 2.* installed), I think this is the best we can do? I would have thought converting all numpy objects in NuRadioMC to pure Python objects before serialization (and vice versa on deserialization) is a lot of work and probably adds some significant performance overhead as well.
why dont we pin it to >= 2? We can read old files without problems and we ensure if new files with 2.x are produced that everyone can read them.
I mean, not to open a broader discussion, but this seems like a drawback of the pkl paradigm. Just doing some googling, it sounds like it has two well known challenges: that the files may not be compatible across python versions, and they also have security concerns. Some quick googling actually pulls up some articles on this:
So maybe pkl just isn't great as a long term / archive solution, or even somethings that's stable user to user because they may have different python and numpy versions, as we're running into now? It would probably take a lot of work to revamp the data format, but I wonder if it's worth thinking about seriously for a future major revision?
In the meantime, I think pinning a version of numpy sounds good.
why dont we pin it to >= 2? We can read old files without problems and we ensure if new files with 2.x are produced that everyone can read them.
numpy>=2 would mean we have to deprecate all python versions before 3.9. I don't think this is worth it at this stage.
So maybe pkl just isn't great as a long term / archive solution, or even somethings that's stable user to user because they may have different python and numpy versions, as we're running into now? It would probably take a lot of work to revamp the data format, but I wonder if it's worth thinking about seriously for a future major revision?
Yep, pickle has a number of downsides, and there are probably better ways to do storage - maybe worth having a think about. But I agree that for now revamping storage completely is too big of an ask (and I certainly don't think we should do this in a minor release, so we should still pin numpy<2 in 2.2.x/2.3).
Python 3.8 has its end of life this October! My personally opinion is that we can (and should) deprecate all python versions which are "end of life". The only reason I would see to keep an old version is that some users might make use of some clusters which only support end-of-live python versions....
Python 3.8 has its end of life this October! My personally opinion is that we can (and should) deprecate all python versions which are "end of life". The only reason I would see to keep an old version is that some users might make use of some clusters which only support end-of-live python versions....
I reckon the advantage of having updated versions of NuRadioMC for users that work on clusters that do not support old Python versions outweighs whatever the benefits of numpy 2.x are for the moment. But that's a discussion that's worth having.
I think some of these concerns can be mitigated by pushing folks to use e.g. conda environments. but that's not a perfect solution either. But it's also true that older versions of python can't be supported forever, and unfortunately academic computing clusters are infamously slow to upgrade. Do we know how serious of a problem this would be for the collaboration? I wonder if a quick poll would be helpful.
I wonder if a quick poll would be helpful.
I like the idea. But I would not be surprise that if we do a poll and the outcome is we can move to numpy 2 that in half a year someone comes complaining who ignored the poll ... xD
I'm not sure that python EOL dates are hugely relevant. EL8 has python3.6 by default, for instance. It's possible to use newer python, of course, but at the cost of having to manage any non-python dependencies with python bindings built against python3.6 yourself, e.g. recompiling ROOT with the right python headers for PyROOT). Even on clusters with no module system or anything like that (do those exist?), compiling newer python from source is fairly straightforward, though perhaps a bit of a hurdle for a novice user. It is a bit unfortunate that the python ecosystem moves both so fast and slow. I think deprecating python versions requires a good reason, but this might be a good enough reason.
I'm more concerned about people getting into trouble by trying to upgrade numpy to >=2 in case they have installed something else in the same environment that requires numpy <2. Hopefully most people are following best practices and have a dedicated virtual environment for NuRadioMC, but I'm not sure that's always the case. Even then, I'm not sure pip will automatically update other packages people may be using concurrently in the same environment that require newer versions for numpy 2.0 support in an existing environment (e.g. uproot, awkwardarray, numba), and I'm not sure all users will effectively be able to solve this problem without help (probably, in most cases, starting a new venv is the easiest solution...).
Describe the issue Numpy 2.0 is going to deprecate some things, like
np.infty
. This means that lines like:from NuRadioMC.EvtGen.generator import generate_eventlist_cylinder
start throwing errors:I guess this is formally an issue with radiotools, but I have to imagine things like this are sprinkled into NuRadioMC too.