xieguigang / mzkit

Data toolkits for processing NMR, MALDI MSI, MALDI single cell, Raman Spectroscopy, LC-MS and GC-MS raw data, chemoinformatics data analysis and data visualization.
https://mzkit.org
MIT License
50 stars 16 forks source link

Does this page is the result of peak picking? xcms_id?Not AccumulateROI? #30

Open YUANMENG-1 opened 3 weeks ago

YUANMENG-1 commented 3 weeks ago

I am using mzkit-20240209 version, it seems that mzML can batch import, but can not batch run peak extraction? It can only be done by right-clicking deconvolution one file at a time, but this step appears to include peak finding. Although it is only called deconvolution(peak finding on XIC from the runtime), the result is xcms_id. What I understand about mzkit is that it uses the Accumulate curve algorithm, right? Different from xcms? So I get xcms_id and all snratios are negative, am I using something wrong?

image image
xieguigang commented 3 weeks ago

Hi,

Because we consider that the performance of the general user’s computer hardware may not be very good, the mzkit desktop software currently only supports the deconvolution operation of a single non-targeted raw data file. However, in the underlying code, the deconvolution operation on a large batch of raw data files is now fully supported. The function of deconvolving a large batch of raw data based on the mzkit desktop software will be added in subsequent version updates. Currently, the mzkit R# language package already has such a function for processing the deconvolution of a large number of raw data files in a server environment.

here is the mzkit api function on server environment for deal with the deconvolution of a large batch of the rawdata file:

https://github.com/xieguigang/mzkit/blob/7ba0ddb8d6f5d49e8f03dbfd0dd3be1b5d12fbf9/Rscript/Library/mzkit_app/R/LCMS/deco.R#L18

here is the corresponding IPC parallel version from the mzkit_hpc package on server environment for deal with the ultra large batch of the rawdata files:

https://github.com/xieguigang/mzkit_hpc/blob/8361f8d4692d0ee7943b0897d9fa75dd2f9adc89/R/lcms.R#L12

😅 hope this could be useful

YUANMENG-1 commented 3 weeks ago

Thank you very much for your reply! I will try my best to explore and, if I can succeed, I will also officially quote your work. Thank you very much for your help.

在 2024-08-19 23:12:31,"この中二病に爆焔を!" @.***> 写道:

Hi,

Because we consider that the performance of the general user’s computer hardware may not be very good, the mzkit desktop software currently only supports the deconvolution operation of a single non-targeted raw data file. However, in the underlying code, the deconvolution operation on a large batch of raw data files is now fully supported. The function of deconvolving a large batch of raw data based on the mzkit desktop software will be added in subsequent version updates. Currently, the mzkit R# language package already has such a function for processing the deconvolution of a large number of raw data files in a server environment.

here is the mzkit api function on server environment for deal with the deconvolution of a large batch of the rawdata file:

https://github.com/xieguigang/mzkit/blob/7ba0ddb8d6f5d49e8f03dbfd0dd3be1b5d12fbf9/Rscript/Library/mzkit_app/R/LCMS/deco.R#L18

here is the corresponding IPC parallel version from the mzkit_hpc package on server environment for deal with the ultra large batch of the rawdata files:

https://github.com/xieguigang/mzkit_hpc/blob/8361f8d4692d0ee7943b0897d9fa75dd2f9adc89/R/lcms.R#L12

hope this could be useful

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

YUANMENG-1 commented 3 weeks ago

I am so sorry, but I still need to ask ① whether your deconvolution here includes the steps of peak picking? ② Is the principle of accumulative curve used for peak picking?and can I directly quote the results of mzkit as the method of peak picking for accumulative curve?

在 2024-08-19 23:12:31,"この中二病に爆焔を!" @.***> 写道:

Hi,

Because we consider that the performance of the general user’s computer hardware may not be very good, the mzkit desktop software currently only supports the deconvolution operation of a single non-targeted raw data file. However, in the underlying code, the deconvolution operation on a large batch of raw data files is now fully supported. The function of deconvolving a large batch of raw data based on the mzkit desktop software will be added in subsequent version updates. Currently, the mzkit R# language package already has such a function for processing the deconvolution of a large number of raw data files in a server environment.

here is the mzkit api function on server environment for deal with the deconvolution of a large batch of the rawdata file:

https://github.com/xieguigang/mzkit/blob/7ba0ddb8d6f5d49e8f03dbfd0dd3be1b5d12fbf9/Rscript/Library/mzkit_app/R/LCMS/deco.R#L18

here is the corresponding IPC parallel version from the mzkit_hpc package on server environment for deal with the ultra large batch of the rawdata files:

https://github.com/xieguigang/mzkit_hpc/blob/8361f8d4692d0ee7943b0897d9fa75dd2f9adc89/R/lcms.R#L12

hope this could be useful

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xieguigang commented 3 weeks ago

yes, this method is accumulative curve based algorithm. here is the traceback of the caller stack to use this algorithm in mzkit desktop:

  1. the peak finding task is started from the background R# script call: https://github.com/xieguigang/mzkit_win32/blob/39bb075e122dd232ed6928688b7d81b70c91620a/rstudio/pipeline/MS1deconv.R#L8

  2. the function MS1deconv reference to the .NET clr function: https://github.com/xieguigang/mzkit_win32/blob/39bb075e122dd232ed6928688b7d81b70c91620a/services/PipelineHost/BackgroundTask.vb#L330

<ExportAPI("MS1deconv")>
Public Function Deconv(raw As String, massdiff As Double) As PeakFeature()
End Function

while this function accept a single file path of the mzXML/mzML rawdata file

  1. then XIC data was generated at this line of code with the given mz tolerance error. this operation will split the raw data into multuiple chromatogram data for do the peak finding
Dim massGroups = scanPoints.GetMzGroups(mzdiff:=DAmethod.DeltaMass(massdiff)).ToArray
  1. the code that show above will reference to the mzkit math algorithm library for calling the peak finding algorithm, GetPeakGroups:
''' <summary>
''' 2. 对得到的XIC进行峰查找
''' </summary>
''' <param name="mzgroups"></param>
''' <param name="quantile"></param>
''' <param name="source">set the source tag value to <see cref="PeakFeature.rawfile"/></param>
''' <returns></returns>
<Extension>
Public Function DecoMzGroups(mzgroups As IEnumerable(Of MzGroup), peakwidth As DoubleRange,
                             Optional quantile# = 0.65,
                             Optional sn As Double = 3,
                             Optional nticks As Integer = 6,
                             Optional joint As Boolean = True,
                             Optional parallel As Boolean = False,
                             Optional source As String = Nothing) As IEnumerable(Of PeakFeature)

''' <summary>
''' All of the mz value in <paramref name="mzpoints"/> should be equals
''' </summary>
''' <param name="mzpoints"></param>
''' <returns></returns>
''' <remarks>实际的解卷积操作步骤:应用于处理复杂的样本数据</remarks>
<Extension>
Public Iterator Function GetPeakGroups(mzpoints As MzGroup, peakwidth As DoubleRange,
                                       Optional quantile# = 0.65,
                                       Optional sn_threshold As Double = 3,
                                       Optional joint As Boolean = True) As IEnumerable(Of PeakFeature)
  1. as mentioned in this issue, PopulateROI function was invoked for use the accumulative curve method for peak finding.
For Each ROI As ROI In valids.Shadows.PopulateROI(
                                          peakwidth:=peakwidth,
                                          baselineQuantile:=quantile,
                                          joint:=joint,
                                          snThreshold:=sn_threshold
                                      )
    Yield New PeakFeature With {
        .mz = std.Round(mzpoints.mz, 4),
        .baseline = ROI.baseline,
        .integration = ROI.integration,
        .maxInto = ROI.maxInto,
        .noise = ROI.noise,
        .rt = ROI.rt,
        .rtmax = ROI.time.Max,
        .rtmin = ROI.time.Min,
        .nticks = ROI.ticks.Length,
        .area = ROI.ticks.Select(Function(t) t.Intensity).Sum
    }
Next

the ROI data was extract from the XIC chromatogram signal data, and then convert to LCMS peak feature object for exports as the ms1 ion peaks.

These two blog articles given the details about how to implements this algorithm:

  1. https://stack.xieguigang.me/2021/peak-finding-algorithm/
  2. https://stack.xieguigang.me/2022/lcms-peak-finding-and-deconvolution/
YUANMENG-1 commented 3 weeks ago

Thank you for your incredibly detailed explanation. After carefully reviewing all the links you provided, I now understand:

① For the current mzkit desktop version, I can complete the process of Mzgroup + XIC construction + peak finding based on the accumulative curve by right-clicking on a single file and selecting deconvolution. The only remaining question is whether it's correct that the snRatio in the far right column of each output table is negative?

② Regarding the R# scripts that can run on HPC and servers, I am trying to understand your R# tutorials. I am also contacting our lab's HPC and server engineers to see how we can set up the R# .NET environment. If that doesn't work out, we plan to use mzkit win32 desktop to analyze the ground truth dataset results as a direct reference for your work on the accumulative curve!

I am extremely grateful for your repeated help! I wish you all the best in your work! I have always found this peak detection method to be uniquely ingenious, and I am eager to see how it performs in our 17 peak detection workflows. I also hope it will gain more recognition and usage! Best wishes for your work and everything related to mzkit. It's truly impressive!

xieguigang commented 3 weeks ago

the mzkit package is developed under the .NET 6 runtime, could be running on LINUX natively. I will push a ubuntu based docker image that have the mzkit package installed to the docker hub laterly this month. you can try the mzkit docker image.

YUANMENG-1 commented 3 weeks ago

That's great! Thank you again for your response!

xieguigang commented 2 weeks ago

Hi,

the docker image which have mzkit package installed is released at docker hub, you can download this image via commandline: docker pull xieguigang/mzkit:v20240831. You can download demo tools script that use the mzkit library for run the LCMS peaktable deconvolution from here.

There is some details notes about how to run this demo script file:

1. for get commandline help information, you can try:

docker run -it -v "$PWD:$PWD" -w "$PWD" mzkit:v20240831 R# ./make_peaktable.R --help

then you will get a script commandline usage help information output looks liked:

image

2. a simple example

For instance, you have place the mzXML/mzML rawdata files inside the current work directory:

image

then you can just run a simple commandline for make the peaktable deconvolution:

docker run -it -v "$PWD:$PWD" -w "$PWD" mzkit:v20240831 R# ./make_peaktable.R --raw_dir ./

which it means run processing of the mzXML/mzMl files inside current working directory, and then also exports the peaktable result file inside current work directory.

when the demo script finished the job, then you will find the peaktable.csv file in current directory, and this peaktable file keeps the same format as the xcms package outputs:

image

3. result evaluation

The LCMS rawdata from article "Serum organic acid metabolites can be used as potential biomarkers to identify prostatitis, benign prostatic hyperplasia, and prostate cancer" is apply for the mzkit deconvolution result evaluation.

after the raw output peaktable file make the rawdata pre-processing:

  1. filter out rows data which has more then 80% missing in every sample groups
  2. impute the zero with the half of min positive peak area
  3. normalized the peaks area data with total peak sum

plsda_scoreMN

then we can gets the metabolites expression result that agree with the conclusion from the article:

image image

image image

you can download the rawdata files from metabolights: MTBLS6039 for reproduce this result with mzkit.

YUANMENG-1 commented 1 week ago

Hi,

the docker image which have mzkit package installed is released at docker hub, you can download this image via commandline: docker pull xieguigang/mzkit:v20240831. You can download demo tools script that use the mzkit library for run the LCMS peaktable deconvolution from here.

There is some details notes about how to run this demo script file:

1. for get commandline help information, you can try:

docker run -it -v "$PWD:$PWD" -w "$PWD" mzkit:v20240831 R# ./make_peaktable.R --help

then you will get a script commandline usage help information output looks liked:

image

2. a simple example

For instance, you have place the mzXML/mzML rawdata files inside the current work directory:

image

then you can just run a simple commandline for make the peaktable deconvolution:

docker run -it -v "$PWD:$PWD" -w "$PWD" mzkit:v20240831 R# ./make_peaktable.R --raw_dir ./

which it means run processing of the mzXML/mzMl files inside current working directory, and then also exports the peaktable result file inside current work directory.

when the demo script finished the job, then you will find the peaktable.csv file in current directory, and this peaktable file keeps the same format as the xcms package outputs:

image

3. result evaluation

The LCMS rawdata from article "Serum organic acid metabolites can be used as potential biomarkers to identify prostatitis, benign prostatic hyperplasia, and prostate cancer" is apply for the mzkit deconvolution result evaluation.

after the raw output peaktable file make the rawdata pre-processing:

  1. filter out rows data which has more then 80% missing in every sample groups
  2. impute the zero with the half of min positive peak area
  3. normalized the peaks area data with total peak sum

plsda_scoreMN

then we can gets the metabolites expression result that agree with the conclusion from the article:

image image

image image

you can download the rawdata files from metabolights: MTBLS6039 for reproduce this result with mzkit.

"Thank you so much for the surprisingly quick Docker version and the detailed tutorial! However, I would like to ask if this can be used with Singularity pull and run? (Our cluster has Docker, but it's been down for the past two weeks due to a virus issue, and our server can only use Singularity 😭)"

xieguigang commented 1 week ago

Yes, absolutely no problem at all. You can execute the same commands as Docker in Singularity in a completely equivalent method:

# first pull the docker image and
# save as the singularity image file: mzkit.sif
singularity pull -o mzkit.sif docker://xieguigang/mzkit:v20240831

# then run the saved image file
singularity exec --bind "$PWD:$PWD" --pwd "$PWD" mzkit.sif R# ./make_peaktable.R --raw_dir ./
YUANMENG-1 commented 1 week ago

Thank you for your prompt response! It seems that the -o option may not be supported. It might be necessary to first pull the image and then copy it to save it as mzkit.sif.

image

However, using the standard pull command seems to have issues as well.

image

Yes, absolutely no problem at all. You can execute the same commands as Docker in Singularity in a completely equivalent method:

# first pull the docker image and
# save as the singularity image file: mzkit.sif
singularity pull -o mzkit.sif docker://xieguigang/mzkit:v20240831

# then run the saved image file
singularity exec --bind "$PWD:$PWD" --pwd "$PWD" mzkit.sif R# ./make_peaktable.R --raw_dir ./
xieguigang commented 1 week ago

This is a network issue of docker hub. You can try download this docker archive from: https://cdn-biodeep-cn-obs.obs.cn-east-2.myhuaweicloud.com/tools/mzkit_v20240831.tar, and convert this docker image file to singularity sif image file.

YUANMENG-1 commented 1 week ago

image Thank you very much, I can run successfully! Look forward to the results of the comparison with other processes!