Closed drewszabo closed 2 years ago
On the file size problem. In my project, I have >9000 features in fGroups. The mslists ends up with almost 8mil elements after filtering and a file size of 1.8GB. For features with higher m/z, the MS list is enormous, but I only require the precursor and isotopes for analysis. This takes a substantial amount of time to save and load the mslists object, I suppose this is a Windows single-threaded file system thing. Anything to help the process would be great.
Hi Drew,
Many thanks for bringing this up, it seems you caught a recent regression, and I just pushed out a fix so that only MS data is filtered again.
For the size of peak lists: personally I always try to prioritize the features as much as possible before going to any of the annotation steps. (You have to be a bit inventive sometimes with this, and it can be quite specific to the type of data and study you are working with.) But if there are almost 10k feature groups I can imagine you end up with a large object. There is of course also the possibility to apply other filter steps, usually I apply the topMost
filter and perhaps some relative minimum intensity. Did you already apply any of these? You could also think of applying the annotatedBy with formula annotation data, which may help a bit with subsequent compound annotation.
I am not sure how much time 'rich' MS/MS data will add to SIRIUS, but my feeling is that other steps (eg retrieving data from CSI) may take more time.
Thanks, Rick
Thanks for the fix.
And yes, I have been experimenting with different filters to reduce the number of features. I am having trouble with noisy MS peaks getting through my initial filters. I am going to try and run the extracted features through the MetaClean and NeatMS ML-approaches next (https://github.com/bihealth/NeatMS/). NeatMS has the advantage of being pre-trained and has three categories, compared to MetaClean's 2 categories.
Closing off the issue. Thanks, DS
Hey Rick,
I'm trying to reduce the complexity (and file size) of my mslists by using the
isolatePrec
argument inpatRoon::filter(mslists, ...)
. However, I have found that it actually isolates the precursor in both MS and MSMS peak lists, including the averagedPeakList. I wonder if this was intended, and if there is a way to only filter the MS lists alone, leaving the complete MSMS list for further analysis. Perhaps by usinggetDefIsolatePrecParams()
?Perhaps you can tell me if having a large MS peak list is adding any compute time to my
generateFormulasSIRIUS()
orgenerateCompoundsSIRIUS()
? It would be great to reduce my compute time too.Cheers,