rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

.isotope_peak function for isotope peak groups identification #190

Closed andreavicini closed 2 years ago

jorainer commented 3 years ago

Thanks for the updates @andreavicini ! I will also run some benchmarks on this to see if we can speed up things.

andreavicini commented 3 years ago

@sgibb I realized that with the proposed use of idxsand removal of to_test we obtain a different behaviour than before in the case seedMz is provided by the user. Now the function would look only within the idxs (i.e. the indices of the peaks whose m/z value is closest to the seedMz) when looking for groups whose first peak matches seedMz, but the search should be extended also to all the other peaks. I think I should either add back a variable to specify which peaks to check or add an if-else to treat separately the two cases (when seedMz is provided and when it is not).

sgibb commented 3 years ago

@sgibb I realized that with the proposed use of idxsand removal of to_test we obtain a different behaviour than before in the case seedMz is provided by the user. Now the function would look only within the idxs (i.e. the indices of the peaks whose m/z value is closest to the seedMz) when looking for groups whose first peak matches seedMz, but the search should be extended also to all the other peaks. I think I should either add back a variable to specify which peaks to check or add an if-else to treat separately the two cases (when seedMz is provided and when it is not).

Sorry, I didn't thought about this. What about (code just for demonstration, doesn't work):

idxs <- which(to_test)

if (length(seedMz))
  seedMz <- na.omit(closest(seedMz, x[idxs, 1], ...)
else
  seedMz <- idxs

# ...

for (i in seedMz) {
  # same code as before in the loop
}
andreavicini commented 3 years ago

Thanks for the suggestion. I'm not sure I understood it correctly but I think if I did that and left the loop as before maybe wouldn't work when seedMz is given. In that case for example idx[1] might be a low index while the indexes in seedMz might be greater and then the condition i == idxs[1] would never be satisfied

sgibb commented 3 years ago

Thanks for the suggestion. I'm not sure I understood it correctly but I think if I did that and left the loop as before maybe wouldn't work when seedMz is given. In that case for example idx[1] might be a low index while the indexes in seedMz might be greater and then the condition i == idxs[1] would never be satisfied

Indeed you are right. Sorry for the confusion.

jorainer commented 3 years ago

@andreavicini @sgibb , what is the current status of the review? Are your points all addressed now @sgibb ?

I also have a more fundamental question which is if we should add this functionality really to Spectra or maybe better to MetaboCoreUtils (or MsCoreUtils - but this approach is currently tuned towards small compound, i.e. metabolomics data). The reason is that we will use this function also in xcms, MetaboAnnotation and also MsFeatures - without actually using Spectra objects.

sgibb commented 3 years ago

@jorainer I am not sure that we have a working version now (cause of my confusing last comments). You are right this should go to MsCoreUtils or MetaboCoreUtils. If it is really tuned to metabolomics I would vote for MetaboCoreUtils.

jorainer commented 3 years ago

OK, I would then suggest the following: @andreavicini , please make a pull request adding this code to MetaboCoreUtils and @sgibb , if possible, I would then ask you to re-review the PR again there.

jorainer commented 2 years ago

Closing this PR because (as agreed above) this fits better into MetaboCoreUtils (-> PR https://github.com/rformassspectrometry/MetaboCoreUtils/pull/38)