rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Enable chunk-wise processing for all peaks data functions #306

Closed jorainer closed 7 months ago

jorainer commented 7 months ago

This PR fixes issue #304 . In brief: it adds the possibility for the user to set and define chunk-wise processing of a Spectra. This will affect all functions working on peaks data (e.g. even lengths, mz, peaksData) and ensures that even large-scale data can be handled reducing out of memory errors.

What this PR adds:

This processing is used for all Spectra methods accessing peaks data (or processing the peaks data). To avoid performance loss for small data sets or in-memory backends it is not performed by default. If enabled by the user, it allows to process also large data.

I think this is a very important improvement allowing the analysis of large (on-disk) data - for which we ran into unexpected issues (see #304).

Happy to discuss @sgibb @lgatto @philouail .

jorainer commented 7 months ago

@philouail , I've fixed some more things, can you please give again a careful look - any questions, concerns, comments or change requests highly welcome!

I've added now also a vignette describing the parallel processing settings. Please have a look at that too.

jorainer commented 7 months ago

Thanks for the reviews @andreavicini and @philouail ! I will merge now after having another look myself again.

sgibb commented 7 months ago

@jorainer sorry but I didn't review the code but a small suggestion anyway: If we add a new slot to the Spectra class it would break backward compatibility. So I would suggest to increment the "version" of the class and increment the minor number of the version field in the DESCRIPTION file.

jorainer commented 7 months ago

Hi Sebastian @sgibb , thanks for the suggestion. I'll increment the class version. I ensured backward compatibility through the accessor function that checks if the object has the slot and if not returns Inf (the default for the slot value). Also, Spectra methods will (automatically) call updateObject if required. So, backward compatibility should be guaranteed.

I would maybe not bump the minor version of the package to not interfere with the Bioconductor versioning?