Open DarioS opened 6 months ago
Hi @DarioS Could you tell me how you created this plot? And (this is just guess, but) whether the peak height is referenced to zero or baseline might be relevant to that.
Blue region uses expert-chosen integration bounds. Red dots are MS-DIAL bounds. Region is coloured in by geom_polygon
of ggplot2. Baseline start height and baseline end height determine the straight line which is the baseline of MS-DIAL.
Thank you for the information. How did you obtain the Red dots MS-DIAL bounds?
By the way, We are sorry if we may have forgotten to respond to your request for exporting peak bounds that you previously asked for. Was it created as an issue in this repo?
This is for peak list output. Indeed, the alignment bounds is issue 165 and appears to not yet be implemented. It looks like common pattern is that there is a second peak to the right which might be infulencing the peak of interest.
Thanks you for reminding me about https://github.com/systemsomicslab/MsdialWorkbench/issues/165 . Yes, we export the bounds in for the peak list, but we don't export bounds in the alignment...
Anyway, we'd like to start checking the MS-DIAL bounds in the peak list. We would ideally like to have the source raw data (and the MS-DIAL parameters) of the peaks you visualized. Is it private data?
If we can't ask for the data sharing, we'd like to know some public data instead (that MS-DIAL has poor bounds).
I have confirmed that this data set is not able to be shared. We are not aware of any other data set comparing expert peak integration, which is a very time-consuming process to computational. So, I don't know of any other suitable data set. The issue could be more prevalent than people realise and simply going undetected if biological analysis is started immediately.
@DarioS Thank you for confirming. I understand what you are saying, and I also think the issue might be widespread.
However, I don't have any ideas on how to pinpoint the cause of the problem.
So, although it is not a direct solution, I would like to suggest two things first instead:
Let me know if you have any comments.
O.K. see if you can make sense of it. SlopeOfBaseline
values (about 300) seem huge compared to the average value.
> summary(allPeaks$SlopeOfBaseline)
1st Qu. Median Mean 3rd Qu.
-3.99 0.00 7.59 0.79
Can you look at your own dataset for peaks with a huge SlopeOfBaseline
value and find examples for code testing?
It seems to happen more often for low-intensity peaks
but could also sometimes be observed for high-intensity ones.