zeehio / MassSpecWavelet

MassSpecWavelet Bioconductor package
8 stars 2 forks source link

Suggestions for `getLocalMaximumCWT`, `getRidge`, `identifyMajorPeaks`, and `peakDetectionCWT`. #4

Closed breidan closed 2 years ago

breidan commented 2 years ago

Hi,

great that someone took over maintenance of MassSpecWavelet.

MassSpecWavelet, together with MSnbase and xcms, has become somewhat like my Swiss army knife for everything related to MS. Which is probably why I ran into some issues which might not surface otherwise.

One thing which has cost me considerable time to figure out is the use of amp.Th: if going the path of cwt, getLocalMaximumCWT, getRidge, and identifyMajorPeaks (as in the vignette) amp.Th will be an absolute minimum value. But if using peakDetectionCWT amp.Th is a relative value. So you might be running peakDetectionCWT with too large a value for amp.Th and get this error message:

Error in names(ridgeList) <- paste(1, names(ridgeList), sep = "_") : 'names' attribute [1] must be the same length as the vector [0]

This error is thrown by getRidge which is not all that obvious. The reason is that in the local maximum matrix no ridges could be found because amp.Th was too large. It adds to the confusion that when running the process step-by-step no error is thrown because then getLocalMaximumCWT treats amp.Th as absolute. Would be good if getRidge would put out a more meaningful error message.

Another annoying issue is nearbyWinSize in identifyMajorPeaks: I understand to avoid boundary effects selInd3 flags peaks which are close to the start or end of the signal profile so they can be excluded. But if there is a real peak that disappears too. Setting nearbyWinSize to a small value avoids this. But if nearbyPeak is then set to TRUE then nearbyWinSize will be set to 150 in the code and the real peak disappears again without apparent reason. Maybe a somewhat different choice of names would help. And/or pointing this out in the help documentation.

Hope this is not just the ranting of an old man :-)

Andreas

zeehio commented 2 years ago

Hi Andreas!

Thanks for all the feedback. I have faced the issue with amp.Th being relative /absolute as well. I plan to tackle it, however I will do my best to avoid breaking changes, since this is quite a core package (e.g. xcms users would not like getting different results because they upgraded MassSpecWavelet).

Let me try to summarize all your comments in sections that I can tackle independently. Please tell me if you see anything differently.

amp.Th absolute/relative behaviour

This behaviour is not consistent and therefore confusing. Besides, it is not that common to have such different behaviour based on the name of a scalar so it is surprising.

Since the semantics of amp.Th are different today, it's hard to not break backwards compatibility and get a beautiful API. Given the nature of MassSpecWavelet, I would rather not break backwards compatibility even if it means getting an uglier API.

My proposal here would be to add support to getLocalMaximumCWT so it can accept relative amp.Th values as well.

I suggest to give getLocalMaximumCWT() an additional argument is_amp_thres_relative being FALSE by default. The inconsistent behaviour is still there, but at least you can make it behave.

On a further step I have some other changes in mind to MassSpecWavelet and I would consider adding a new function with similar behaviour but more consistent API.

The current behaviour would anyway still be available.

getRidges errors when no ridges are available

I consider this a bug. I like functions that work with zero row data frames and zero length vectors and lists. If there are no local maximums, get ridges should return zero ridges instead of giving an error.

identifyMajorPeaks and nearbyWinSize

Again, this is quite a painful current behaviour. It is surprising and inconsistent. I can give a warning if nearbyWinSize is set to 150 and the user had set it to s different value, and provide some option for the user to avoid the 150.

Final comments

I plan to review the API and rebuild something possibly less convoluted and more consistent (and possibly a bit faster!), but in the meantime I hope we can smooth all the pain points you find.

I would really appreciate your feedback here. If you have alternative solutions feel free to suggest them. It's not my first package, but it's the first I have inherited, so I'm being as careful as I can with it.

Thanks!

breidan commented 2 years ago

Hi @zeehio , I find your suggestion for dealing with amp.Th good. That would keep backwards compatibility and make the user more aware of the issue. Then the user can better decide what to do. For identifyMajorPeaks and nearbyWinSize I do not think that a change in the code is needed. The help documentation currently does not properly reflect what nearbyWinSizedoes and needs to be revised. I believe that is all that is needed. And yes, changing the code of getRidges to be able to deal with zero length ridge lists would certainly solve that issue.

I am happy to help as much as I can.

Cheers, Andreas

zeehio commented 2 years ago

Hi @breidan,

Sorry for the late reply. I have implemented everything we discussed here.

The only change with respect to our discussion is that I added a new argument to identifyMajorPeaks. It was confusing that the size for including nearby peaks was also the size to control the boundaries limit.

Now both arguments can be set independently. The documentation has been updated accordingly. The default values ensure backwards compatibility is preserved.

Would you be willing to test this and check if it meets your expectations?

Otherwise I'll close and release myself to Bioconductor in some days

breidan commented 2 years ago

Hi @zeehio, I will test it as soon as possible. At work we are behind a proxy server and it is a bit complex to get and build from github for us. I will let you know.

breidan commented 2 years ago

@zeehio, I tested the changes and they definitely did not break anything. The changes to the documentation are clear enough and the user can now find the relevant info. Also the error that getRidges threw for zero length lists seems to be gone.

For me this issue is closed. Cheers

zeehio commented 2 years ago

Fantastic, thanks!