mathiaskalxdorf / IceR

Quantitative proteomics workflow
https://mathiaskalxdorf.github.io/IceR/
14 stars 4 forks source link

Questions about FAIMS data and Proteome Discoverer output #7

Closed mofhu closed 2 years ago

mofhu commented 3 years ago

Dear,

We are trying to use IceR in our limited sample workflow to improve data completeness. We try to use a human & Ecoli system with known mix ratio to set a ground truth for evaluation. We used Orbitrap Eclipse w/ FAIMS (OT-IT DDA workflow of mass spec, 2 FAIMS CVs) to generate raw files. We first test separate CVs and use only one CV for MQ and IceR, it is able to run on our workstation. ​

To make our workflow easier and more precise, I would like to ask for your help in the following questions:

  1. to use FAIMS CVs, what do you suggest for MQ/IceR? How to set MQ parameter? the early versions of MQ don't accept raw file with FAIMS CVs and we use freestyle or mzxml converter from Coon group to generate different raw files. more detailed:
    • method 1: set CVs as fractions of same experiment in MQ. in this method, should we set it as neighbored fractions (e.g. 1 & 2) or non-related fractions (e.g. 1 & 3)?
    • method 2: set different CV in different MQ searches and try to combine the result together.
    • for your reference, we notice that with our CV setting of MS, some peptides have LC peak in both CVs (same RT) with different peak area. But we also wonders if some burden on FDR in different CVs may happen, especially for more CVs (e.g. MS1 only with 4-6 CVs).
    • we are trying to use CVs as neighbored fractions in MQ search and put it as IceR input, results will follow when we done it. Is this a good practice to use IceR
  2. could we uses output of proteome discoverer as input? is it possible to modify the output tables into MQ's format (DART-ID from Nikolai Slavov's group accepts this method) to let IceR use this as input?

Many thanks, Frank

mathiaskalxdorf commented 3 years ago

Hey Frank, many thanks for your question. Usage of IceR for FAIMS data is on our to-do list, however, we had no time yet to touch on it. Additionally, I´m not an expert for FAIMS + MaxQ yet. Treating different CVs as neighbored fractions sounds like a good way for the moment, however, we also did not evaluate IceR with fractionated samples (another point on our list). So from a compatibility perspective, method 2 might at the moment be the best working approach for IceR (separate MQ searches and subsequent combination). And I would be really curious to hear if it works and how the results look like. Regarding other input from other preprocessing workflows. That is as well a point on our to-do list. For the current IceR version you would have to format your results similar to the MaxQ outputs. However, we are already working on the feature that IceR can handle a general input format. This feature is almost ready and would then be available via the development branch of the IceR package here on Github. So maybe you want to wait a bit until we release the next version. In the meantime, trying to combine MaxQ searches might be the easiest way. Best, Mathias

mofhu commented 3 years ago

Dear Mathias,

It's great to hear from you. Many thanks for your kind response. I'll analyze data this week and keep you posted.

Best, Mo Mo Hu, PhD Postdoctoral research fellow Biomedical Pioneering Innovation Center (BIOPIC) Beijing Advanced Innovation Center of Genomics (ICG) Peking University

On Mon, Sep 6, 2021 at 2:06 PM mathiaskalxdorf @.***> wrote:

Hey Frank, many thanks for your question. Usage of IceR for FAIMS data is on our to-do list, however, we had no time yet to touch on it. Additionally, I´m not an expert for FAIMS + MaxQ yet. Treating different CVs as neighbored fractions sounds like a good way for the moment, however, we also did not evaluate IceR with fractionated samples (another point on our list). So from a compatibility perspective, method 2 might at the moment be the best working approach for IceR (separate MQ searches and subsequent combination). And I would be really curious to hear if it works and how the results look like. Regarding other input from other preprocessing workflows. That is as well a point on our to-do list. For the current IceR version you would have to format your results similar to the MaxQ outputs. However, we are already working on the feature that IceR can handle a general input format. This feature is almost ready and would then be available via the development branch of the IceR package here on Github. So maybe you want to wait a bit until we release the next version. In the meantime, trying to combine MaxQ searches might be the easiest way. Best, Mathias

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mathiaskalxdorf/IceR/issues/7#issuecomment-913368732, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVAXR2G5LFBP2JZZJTD6ATUARK65ANCNFSM5DPUPL4Q .

mofhu commented 3 years ago

Dear Mathias,

I found issues for my raw file in separated search with 6 files (but not 9 files). for the 9 file search, 3 files of h80e0 were added, the IceR run finished correctly. (however, I was trying to compare e10 vs e20 result, so I would like to adapt your demo code for visualization vs MQ. So I refined MQ search to 6 files with the same search parameters and run IceR with 6 files.) as files are not likely to be the issue, I guess it might be some problem in setting limit when FDR of peak selection: 0 %. (also shown in result of issue of CV=-70V) I'm not familiar with R, could you please help us with the issue?

Best, Mo

R version 4.0.5 (2021-03-31) -- "Shake and Throw"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

> library(IceR)
> runIceR()
Loading required package: shiny

Listening on http://127.0.0.1:7324
[1] "2021-09-08 15:31:00 Job started"
[1] "2021-09-08 15:31:00 Preparation finished"
[1] "The following raw files were found in the specified folder: "
20210316_LFQ2_h80e10_250pg_1_-55.raw
20210316_LFQ2_h80e10_250pg_2_-55.raw
20210316_LFQ2_h80e10_250pg_3_-55.raw
20210316_LFQ3_h80e20_250pg_1_-55.raw
20210316_LFQ3_h80e20_250pg_2_-55.raw
20210316_LFQ3_h80e20_250pg_3_-55.raw
[1] "Could not find msconvert.exe. Can be usually found in C:/Users/user_name/AppData/Local/Apps/ProteoWizard Version"
Reading configuration file C:\Users\hp\Documents\temp_msconvert\config.txt

format: mzXML
    m/z: Compression-Zlib, 64-bit
    intensity: Compression-Zlib, 64-bit
    rt: Compression-Zlib, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: C:\Users\hp\Documents\temp_msconvert\mzXML
extension: .mzXML
contactFilename:
runIndexSet:

spectrum list filters:
  "peakPicking vendor msLevel=1"
  "msLevel 1"

chromatogram list filters:

filenames:
  I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ2_h80e10_250pg_1_-55.raw
  I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ2_h80e10_250pg_2_-55.raw
  I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ2_h80e10_250pg_3_-55.raw
  I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ3_h80e20_250pg_1_-55.raw
  I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ3_h80e20_250pg_2_-55.raw
  I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ3_h80e20_250pg_3_-55.raw

processing file: I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ2_h80e10_250pg_1_-55.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ2_h80e10_250pg_1_-55.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ2_h80e10_250pg_2_-55.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ2_h80e10_250pg_2_-55.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ2_h80e10_250pg_3_-55.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ2_h80e10_250pg_3_-55.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ3_h80e20_250pg_1_-55.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ3_h80e20_250pg_1_-55.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ3_h80e20_250pg_2_-55.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ3_h80e20_250pg_2_-55.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-55_e10vse20/\20210316_LFQ3_h80e20_250pg_3_-55.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ3_h80e20_250pg_3_-55.mzXML

[1] "2021-09-08 15:32:47 Raw file conversion finished"
[1] "2021-09-08 15:33:16 Preparation of mzXMLs finished"
[1] "Read MaxQ results"
[1] "Read MaxQ results finished"
[1] "MS resolution seems to be missing in MaxQ outputs !!!"
[1] "Prepare for peptide feature matching - Indexing all ions"
[1] "Train RF models for mz-corrections"
[1] "20210316_LFQ2_h80e10_250pg_1_-55: R^2 train-set=0.87, R^2 eval-set=0.84"
[1] "20210316_LFQ2_h80e10_250pg_2_-55: R^2 train-set=0.88, R^2 eval-set=0.85"
[1] "20210316_LFQ2_h80e10_250pg_3_-55: R^2 train-set=0.87, R^2 eval-set=0.85"
[1] "20210316_LFQ3_h80e20_250pg_1_-55: R^2 train-set=0.86, R^2 eval-set=0.83"
[1] "20210316_LFQ3_h80e20_250pg_2_-55: R^2 train-set=0.86, R^2 eval-set=0.84"
[1] "20210316_LFQ3_h80e20_250pg_3_-55: R^2 train-set=0.88, R^2 eval-set=0.85"
[1] "2021-09-08 15:34:37 Feature alignmend finished"
[1] "Added 1191 isotope features."
[1] "2021-09-08 15:34:37 Addition of +1 isotope features finished"
[1] "Save peak detection and selection results"
[1] "20210316_LFQ2_h80e10_250pg_1_-55 - FDR of peak selection: 0.2 %"
[1] "20210316_LFQ2_h80e10_250pg_2_-55 - FDR of peak selection: 0 %"
[1] "20210316_LFQ2_h80e10_250pg_3_-55 - FDR of peak selection: 0.2 %"
[1] "20210316_LFQ3_h80e20_250pg_1_-55 - FDR of peak selection: 0 %"
[1] "20210316_LFQ3_h80e20_250pg_2_-55 - FDR of peak selection: 0.2 %"
[1] "20210316_LFQ3_h80e20_250pg_3_-55 - FDR of peak selection: 0 %"
Warning: Error in plot.window: need finite 'ylim' values
  2: shiny::runApp
  1: runIceR

another log for CV=-70V

> runIceR()

Listening on http://127.0.0.1:7324
[1] "2021-09-08 23:52:36 Job started"
[1] "2021-09-08 23:52:36 Preparation finished"
[1] "The following raw files were found in the specified folder: "
20210316_LFQ2_h80e10_250pg_1_-70.raw
20210316_LFQ2_h80e10_250pg_2_-70.raw
20210316_LFQ2_h80e10_250pg_3_-70.raw
20210316_LFQ3_h80e20_250pg_1_-70.raw
20210316_LFQ3_h80e20_250pg_2_-70.raw
20210316_LFQ3_h80e20_250pg_3_-70.raw
[1] "Could not find msconvert.exe. Can be usually found in C:/Users/user_name/AppData/Local/Apps/ProteoWizard Version"
Reading configuration file C:\Users\hp\Documents\temp_msconvert\config.txt

format: mzXML
    m/z: Compression-Zlib, 64-bit
    intensity: Compression-Zlib, 64-bit
    rt: Compression-Zlib, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: C:\Users\hp\Documents\temp_msconvert\mzXML
extension: .mzXML
contactFilename:
runIndexSet:

spectrum list filters:
  "peakPicking vendor msLevel=1"
  "msLevel 1"

chromatogram list filters:

filenames:
  I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ2_h80e10_250pg_1_-70.raw
  I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ2_h80e10_250pg_2_-70.raw
  I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ2_h80e10_250pg_3_-70.raw
  I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ3_h80e20_250pg_1_-70.raw
  I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ3_h80e20_250pg_2_-70.raw
  I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ3_h80e20_250pg_3_-70.raw

processing file: I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ2_h80e10_250pg_1_-70.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ2_h80e10_250pg_1_-70.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ2_h80e10_250pg_2_-70.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ2_h80e10_250pg_2_-70.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ2_h80e10_250pg_3_-70.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ2_h80e10_250pg_3_-70.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ3_h80e20_250pg_1_-70.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ3_h80e20_250pg_1_-70.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ3_h80e20_250pg_2_-70.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ3_h80e20_250pg_2_-70.mzXML

processing file: I:/data/mo.hu/icer/eclipse_CV-70_e10vse20/\20210316_LFQ3_h80e20_250pg_3_-70.raw
calculating source file checksums
writing output file: C:\Users\hp\Documents\temp_msconvert\mzXML\20210316_LFQ3_h80e20_250pg_3_-70.mzXML

[1] "2021-09-08 23:53:51 Raw file conversion finished"
[1] "2021-09-08 23:54:14 Preparation of mzXMLs finished"
[1] "Read MaxQ results"
[1] "Read MaxQ results finished"
[1] "MS resolution seems to be missing in MaxQ outputs !!!"
[1] "Prepare for peptide feature matching - Indexing all ions"
[1] "Train RF models for mz-corrections"
[1] "20210316_LFQ2_h80e10_250pg_1_-70: R^2 train-set=0.91, R^2 eval-set=0.91"
[1] "20210316_LFQ2_h80e10_250pg_2_-70: R^2 train-set=0.95, R^2 eval-set=0.94"
[1] "20210316_LFQ2_h80e10_250pg_3_-70: R^2 train-set=0.89, R^2 eval-set=0.91"
[1] "20210316_LFQ3_h80e20_250pg_1_-70: R^2 train-set=0.92, R^2 eval-set=0.92"
[1] "20210316_LFQ3_h80e20_250pg_2_-70: R^2 train-set=0.91, R^2 eval-set=0.89"
[1] "20210316_LFQ3_h80e20_250pg_3_-70: R^2 train-set=0.91, R^2 eval-set=0.89"
[1] "2021-09-08 23:54:53 Feature alignmend finished"
[1] "Added 755 isotope features."
[1] "2021-09-08 23:54:53 Addition of +1 isotope features finished"
[1] "Save peak detection and selection results"
[1] "20210316_LFQ2_h80e10_250pg_1_-70 - FDR of peak selection: 0 %"
[1] "20210316_LFQ2_h80e10_250pg_2_-70 - FDR of peak selection: 0.4 %"
[1] "20210316_LFQ2_h80e10_250pg_3_-70 - FDR of peak selection: 0.2 %"
[1] "20210316_LFQ3_h80e20_250pg_1_-70 - FDR of peak selection: 0.2 %"
[1] "20210316_LFQ3_h80e20_250pg_2_-70 - FDR of peak selection: 0.2 %"
[1] "20210316_LFQ3_h80e20_250pg_3_-70 - FDR of peak selection: 0.6 %"
Warning: Error in plot.window: need finite 'ylim' values
  2: shiny::runApp
  1: runIceR
mathiaskalxdorf commented 3 years ago

Hey, first a comment to the sample/raw file names. So far file names with special letters (like in your case the - e.g.-50 or -70) always made some trouble (typically R does not like it). So I would recommend (at least for the future) to rename the files in that way, that they are not containing any special letters. This then also means that the names in the MaxQ output files have to be adapted accordingly. I´m not sure if this is the reason for the error. Next, can you check if the file "Performance of feature alignment.pdf" is available (and readable) in the Temporary_files folder? I assume that the error is happening while preparing this QC plot.

mofhu commented 3 years ago

Dear Mathias,

Thanks for your kind response:) for the file name issue, we'll change it in the future to prevent errors. for the issue in files, I guess you've catch the issue, see figure below for the 0 kb PDF file. BTW, I guess this might be a low input-specific issue due to much less ID/Quan than bulk proteomics. image

Best, Frank

mathiaskalxdorf commented 2 years ago

Hey,

just a note. I pushed a new version of IceR to Github (development branch, V1.2.0, master branch, V.0.9.10). The error you got should be hopefully fixed (or at least circumvented). If you want to rerun IceR on your data, please consider deleting the "Temporary_files" folder in the IceR results folder as well as the "Extracted intensities" folders within the folder where the raw files are located.

The new development version also now allows usage of preprocessed data by any pipeline as long as the data is supplied in a specific format. Please have a look in the description for more details.

Best,

Mathias

mofhu commented 2 years ago

Dear Mathias,

Just for a quick feedback. I tried the dev version. Now the job finishes correctly :) Many thanks for your help. I'll analyze the result and keep you updated.

Best regards, Mo Hu

On Mon, Sep 13, 2021 at 2:06 PM mathiaskalxdorf @.***> wrote:

Hey,

just a note. I pushed a new version of IceR to Github (development branch, V1.2.0, master branch, V.0.9.10). The error you got should be hopefully fixed (or at least circumvented). If you want to rerun IceR on your data, please consider deleting the "Temporary_files" folder in the IceR results folder as well as the "Extracted intensities" folders within the folder where the raw files are located.

The new development version also now allows usage of preprocessed data by any pipeline as long as the data is supplied in a specific format. Please have a look in the description for more details.

Best,

Mathias

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mathiaskalxdorf/IceR/issues/7#issuecomment-917869346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVAXR7OGCC3Z6AZHX5KVKTUBWIFZANCNFSM5DPUPL4Q .