systemsomicslab / MsdialWorkbench

Universal workbench incorporating msdial, msfinder, and mrmprobs
https://systemsomicslab.github.io/compms/msdial/main.html
42 stars 13 forks source link

Unexpected Integration Bounds of Untargeted Metabolomics #356

Open DarioS opened 2 months ago

DarioS commented 2 months ago

It seems to happen more often for low-intensity peaks

image

but could also sometimes be observed for high-intensity ones.

image

kozo2 commented 2 months ago

Hi @DarioS Could you tell me how you created this plot? And (this is just guess, but) whether the peak height is referenced to zero or baseline might be relevant to that.

DarioS commented 2 months ago

Blue region uses expert-chosen integration bounds. Red dots are MS-DIAL bounds. Region is coloured in by geom_polygon of ggplot2. Baseline start height and baseline end height determine the straight line which is the baseline of MS-DIAL.

kozo2 commented 2 months ago

Thank you for the information. How did you obtain the Red dots MS-DIAL bounds?

By the way, We are sorry if we may have forgotten to respond to your request for exporting peak bounds that you previously asked for. Was it created as an issue in this repo?

DarioS commented 2 months ago

This is for peak list output. Indeed, the alignment bounds is issue 165 and appears to not yet be implemented. It looks like common pattern is that there is a second peak to the right which might be infulencing the peak of interest.

kozo2 commented 2 months ago

Thanks you for reminding me about https://github.com/systemsomicslab/MsdialWorkbench/issues/165 . Yes, we export the bounds in for the peak list, but we don't export bounds in the alignment...

Anyway, we'd like to start checking the MS-DIAL bounds in the peak list. We would ideally like to have the source raw data (and the MS-DIAL parameters) of the peaks you visualized. Is it private data?

If we can't ask for the data sharing, we'd like to know some public data instead (that MS-DIAL has poor bounds).

DarioS commented 1 month ago

I have confirmed that this data set is not able to be shared. We are not aware of any other data set comparing expert peak integration, which is a very time-consuming process to computational. So, I don't know of any other suitable data set. The issue could be more prevalent than people realise and simply going undetected if biological analysis is started immediately.

kozo2 commented 1 month ago

@DarioS Thank you for confirming. I understand what you are saying, and I also think the issue might be widespread.

However, I don't have any ideas on how to pinpoint the cause of the problem.

So, although it is not a direct solution, I would like to suggest two things first instead:

Let me know if you have any comments.

DarioS commented 1 month ago

O.K. see if you can make sense of it. SlopeOfBaseline values (about 300) seem huge compared to the average value.

> summary(allPeaks$SlopeOfBaseline)
1st Qu.  Median    Mean 3rd Qu. 
  -3.99    0.00    7.59    0.79 

Can you look at your own dataset for peaks with a huge SlopeOfBaseline value and find examples for code testing?