rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Add new default in memory backend MsBackendMemory #247

Closed jorainer closed 1 year ago

jorainer commented 1 year ago

This PR adds the new MsBackendMemory in-memory backend discussed in #246 :

jorainer commented 1 year ago

@lgatto or @sgibb , would be nice if one of you could have a look at it so that I could merge it before the BioC release deadline.

jorainer commented 1 year ago

@lgatto , I had a look at joinSpectraData. The function works - until it comes to add NumericList and other S4 objects. Maybe best if I add a check to the $<-,MsBackendMemory method to test whether value is an S4 object and if so throw an error with the suggestion to change to backend to MsBackendDataFrame.

lgatto commented 1 year ago

As mentioned above for spectraData(), I would suggest for joinSpectraData(x, y) to return the same class as spectraData(x). It would be the user's responsibility to use whatever matches their needs.

jorainer commented 1 year ago

@sgibb @lgatto , I think we have to be careful here with the discussion on DataFrame vs data.frame. I would not want to change the definition of our MsBackend interface to allow backends to return data.frame instead of DataFrame with e.g. the spectraData method. Let's be consistent here and take the object that covers more use cases (e.g. possibility to store S4 objects as spectra variables).

For the actual MsBackend implementations, internally they are allowed to store the data in different ways. Here some things will not be possible, like storing S4 classes as spectra variables in e.g. an SQL database or in an HDF5 array or, like in this case here, in a data.frame. But that's OK since we have alternative backends that support these things - we just have to show an error or warning to the user telling that something is not possible with that backend. For the MsBackendMemory I will add internal checks to test whether users provide S4 classes as spectra variables. I will not change the spectraData,MsBackendMemory and spectraData<-,MsBackendMemory methods to support data.frame in addition to DataFrame - they have to be consistent and compliant with the MsBackend definition. The exception is the backendInitialize method which is defined in a very flexible way and accepts any type of input as long as the backend understands how to deal with it.

I will convert this now to a draft and re-invite you for review once I'm done with some more checks.

jorainer commented 1 year ago

I've updated the $<-,MsBackendMemory method. With that we can also assign list, SimpleList et al as spectra variables (before it would only allow that for peaks variables). With that change now also joinSpectraData works as expected.

PR would be ready for review. Please let me know if you don't have the time to review - I would really like to get this in before the Bioc release freeze.