systemsomicslab / MsdialWorkbench

Universal workbench incorporating msdial, msfinder, and mrmprobs
https://systemsomicslab.github.io/compms/msdial/main.html
50 stars 13 forks source link

GC-MS data processing using MS-DIAL version 5+ #354

Open QizhiSu opened 5 months ago

QizhiSu commented 5 months ago

Hi developers,

Thanks a lot for implementing GC-MS data processing in MS-DIAL version 5+. I find many new and useful functions such as allowing representative spectrum selection and candidate selection. But I also find some small issues which I think is inconsistent with MS-DIAL version 4+ but useful.

  1. For identification, even though "Use retention information for scoring" is unchecked, they are still used for scoring. For example, when I uncheck this option, and keep RI (20) and RT tolerance (0.5 min) as default, many compounds cannot be identified and when I open the compound search, I found only candidates with very small RI tolerance (less than 5), I think the RT tolerance is used. In addition, in my library, there are compounds with no RI (-1), they will never be considered even though I unchecked "Use retention information for scoring", which is inconsistent with Version 4.9.
  2. When I use RT for alignment, only RT is available in the "Basic peak property" tab. When I use RI for alignment, RI is available in the "Basic peak property" tab though, RI is actually shown in the "Peak spot table", but no RT is available. I personally prefer the way MS-DIAL version 4.9 handles this issue. That is regardless RT or RI alignment, in the "Basic peak property" tab, keep both RI and RT comparison with reference, and keep both RT and RI in the "Peak spot table" with will be very helpful.
  3. In the "File property setting", it is not allowed to paste in multiple cells simultaneously, but in Version 4+, it is possible. It will be great to have this setting as well.

Thanks again and I hope this will be helpful for further improvement of this excellent software.

All the best, Sukis

QizhiSu commented 5 months ago

I also find weird integration behavior. I am running low resolution GC-MS data with MSDIAL Ver5.2.240424.3. Some peaks are not well-aligned. As shown below, for the same compound, it creates 4 entries in the alignment table, and in each entry, this compound is identified in different samples. The weird thing is that the peak area is different for the same sample across different entries. It seems that once it is identified, then the peak area is much smaller than if it is not identified. In different entries, the quan mass is a little bit different, but since I use 0.5 Da as mass tolerance, they should be regarded as the same.

peak area peak area2
YukiMatsuzawa commented 4 months ago

Hello Sukis, I apologize for the late response.

Firstly, regarding the issue with the annotation parameters, it appears to have been fixed already. We expect it to be corrected in the next release, so please look forward to that. Next, about the display in the peak property tab, it seems indeed that RT is not displayed, so we will include RT in the display. Thank you for pointing that out. Regarding pasting values in the File property setting, it should be possible. The usability is a bit poor currently, but you can paste multiple lines by selecting the cells in the table rather than the text boxes within the table. We do intend to improve this in the future, but as you may have noticed, there are many still incomplete GCMS features which we plan to prioritize.

Best regards, Yuki

QizhiSu commented 4 months ago

Thanks a lot for the reply. Please also see my second question regarding peak integration.

All the best, Sukis

YukiMatsuzawa commented 4 months ago

I suspect there might be two reasons for the issues. Regarding the peak area, it appears there may be a calculation error, so I am currently checking the program. As for observing similar annotations across multiple peaks, I cannot provide a clear answer without verifying it myself.

May I confirm the conditions to ensure reproducibility?

Are you using RI for annotation or alignment? If you are using RI, which type of index are you using: FAMEs or Alkanes?

QizhiSu commented 4 months ago

I used RT for alignment as using RI for alignment, no RT is available in the Peak Spot Table. But in the identification step, I included alkanes RI calculation, but not used it for scoring.

As for peak integration, I have also reported weird behavior in MSDIAL version 4+, please check http://www.metabolomics-forum.com/index.php?topic=1776.msg5333#msg5333. I guess they are similar. Maybe it is something related to integrator?

YukiMatsuzawa commented 4 months ago

I have checked the behavior of the issues.

Firstly, regarding the splitting into multiple entries, I found that the criteria for calculating EI similarity were too strict, so I have corrected that.

Secondly, regarding the peak area, there was a bug where the peak area calculation would refer to the RI as the time axis if the peak was not detected during alignment while using RI. This has also been fixed.

I plan to release a version including these fixes once I have further stabilized and corrected the behavior of the GCMS. Please wait a little longer.

QizhiSu commented 4 months ago

Great! Thanks a lot for this improvement.

All the best, Sukis