Closed RalfG closed 3 months ago
Yes, removing the source
key from the pickled state is a reasonable choice here. It's not currently being used so far as I can tell, so it shouldn't break anything provided by pyteomics
.
The other option is to just omit _definition
entirely from the pickled state, but that has tradeoffs, the pros being its smaller, not tied to anything that isn't textual, but still enough to resolve from on load, but the con is that if you pickle something, and then some time later the remote source changes, what you'd get when you unpickled may no longer match what you saved.
Thanks for looking into this!
the con is that if you pickle something, and then some time later the remote source changes, what you'd get when you unpickled may no longer match what you saved.
Pickling should be mostly regarded as a suboptimal saving solution anyway. In our case, we only need it for multiprocessing purposes, so this should be fine.
Hi @mobiusklein,
We have been using the ProForma module heavily in most of our tools now, so first: A big thanks for all your work and continuous support on it!
Since Pyteomics 4.7, our tools that use multiprocessing on ProForma peptides have been throwing pickling errors (e.g., compomics/ms2rescore#128). I traced the issue to a recent addition of the Resolver to the modification itself in the
_definition["source"]
attribute. As it contains an sqlalchemy session, it cannot be pickled, and modifications cannot be passed on through multiprocessed functions.Pickling of an unresolved modification works:
Once forcing a resolve by, for instance, getting the mass, it raises a
PicklingError
:The
source
entry in the_definition
attribute was added in https://github.com/levitsky/pyteomics/commit/564d79f72cbe816963baec2b987e45d974d473cf. Is it required anywhere? If not, perhaps simply removing the field would fix the issue.Let me know what you think.
Best, Ralf