kevinkovalchik / RawTools

RawTools is an open-source and freely available package designed to perform scan data parsing and quantification, and quality control analysis of Thermo Orbitrap raw mass spectrometer files from data-dependent acquisition experiments.
Apache License 2.0
64 stars 19 forks source link

Failing on QC analysis with database #76

Open ilham-rabbi opened 2 years ago

ilham-rabbi commented 2 years ago

Hi, when i run the basic QC analysis, the program produced a output as intended. But when I try to do the QC with the database, the program fails and returns this error

Unhandled Exception: System.ComponentModel.Win32Exception: The system cannot find the file specified at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo) at RawTools.Utilities.ConsoleUtils.VoidBash(String cmd, String args) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\Utilities\Utilities.cs:line 109 at RawTools.QC.Search.RunSearch(WorkflowParameters parameters, MethodDataContainer methodData, String rawFileName) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\QC\RunSearch.cs:line 63 at RawTools.WorkFlows.WorkFlowsDDA.UniversalDDA(IRawFileThreadManager rawFileThreadManager, WorkflowParameters parameters, QcDataCollection qcDataCollection) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\WorkFlows\DDA.cs:line 188 at RawTools.Program.Run(Dictionary2 opts) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\Program.cs:line 243 at RawTools.Program.Main(String[] args) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\Program.cs:line 73`

I'm new to software like this and was hoping for some help understanding the issue.

*note i am using raw files from the EBI Pride database project PXD011070 (the files suggested in the guide you have)

chrishuges commented 2 years ago

Hi,

Can you give me some more details? Are you using the command line version of RawTools? If so, what is the command you are feeding it? If you are using the GUI, can you tell me the options you are providing it? Can you let me know what the database is you are using? I assume you are using the latest version of RawTools?

Do I understand correctly that this error only occurs if you specify a sequence database? If you don't try to do a search everything works as expected?

Any info you can give that can help me reproduce the error in order to fix it is much appreciated.

Chris

ilham-rabbi commented 2 years ago

Hi,

Im using the command line version of RawTools. the command I've used is:

>RawTools -d D:\IAR\RawTool\ExampleRawFiles\ -qc D:\IAR\RawTool\ExampleRawFilesResults\ -db D:\IAR\RawTool\HumanDB.fasta -fmods 57.0214@C -vmods 15.9949@M -X D:\IAR\RawTool\xtandem\bin

I'm using the Uniprot UP000005640 human database

Yes, I am using the latest version. And yes again, the issue only occurs when I sequence a database.

I have just tried doing it with the Gui and turned up an error that reads like this: image (apologies for the poor cropping, the console output disappears seconds after the error so I had trouble taking the screenshot)

the setting I used are: image

I hope this help clarify my situation.

Ilham

chrishuges commented 2 years ago

Hi,

Thank you for this additional information. It appears the GUI and command line errors are different. I think we should just focus on the command line one for now.

I am having trouble reproducing the error on my own system (for reference, I am running on a Windows machine with RawTools 2.0.7).

If I use the same command you have on the command line using just a subset of the raw files you are using, my command executes fine:

C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtools207>RawTools.exe -d C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\raw-test-file\testingQc\ -qc C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\raw-test-file\testingQc\ -db C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\raw-test-file\testingQc\UP000005640_9606.fasta -fmods 57.0214@C -vmods 15.9949@M -X C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\tandem-win-17-02-01-4\tandem-win-17-02-01-4\bin
2 file(s) to process

Processing: C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\raw-test-file\testingQc\ch_23Aug2018_HeLa_Std_1.raw

Determing MS analysis order... Done!
Extracting scan indices: 100%
Checking for orphaned scans: 100%

================ Scan indexing report ================
Total scans in file: 11378
Scans linked: 11378

Orphan scans:
None!

All scans accounted for!
======================================================

Extracting reaction events: 100%
Extracting scan data: 100%
Extracting trailer extras: 100%
Extracting precursor masses: 100%
Extracting retention times: 100%
Analyzing precursor peaks: 100%
Calculating meta data
  MS1 isolation interference
  MS2 scan cycle density
  Ion injection time
  FAIMS voltages
  Duty cycle
  Intensity distribution
  Summed intensities
Calculating metrics
Writing MGF file: 100%
Reading FASTA file
Writing target-decoy database: 100%

X! TANDEM Alanine (2017.2.1.4)

Loading spectra| (mgf).. loaded.
Spectra matching criteria = 2969
Starting threads .|.|.|.|.|.|.|.|.|.|.|.|.|.|. started.
Computing models:
        testing 1 2 3 testing 1 2 3 testing 1 2 3
        waiting for 23|45|6|7|8|9|10|11|12|13|14|15| done.

        sequences modelled = 41 ks
Model refinement:

        waiting for 2|3|4|5|6|7|8|9|10|11|12|13|14|15| done.

Merging results:
        from 23456789101112131415

Creating report:
        initial calculations  ..... done.
        sorting  ..... done.
        finding repeats ..... done.
        evaluating results ..... done.
        calculating expectations ..... done.
        writing results ..... done.

Valid models = 2038

Sufficient PSMs and decoy hits detected for ID filtering.
Total number of unfiltered PSMs: 2960
Total number of decoy hits: 300
Total hits: 2960
Top decoy score: 30.8
Non-decoy hits: 2660
Non-decoy hits above top decoy score: 1906
Digestion efficiency: 0.707764952780693
Missed cleavage rate (/PSM): 0.322665267576076
IDrate: 0.635333333333333
15.9949@M modification frequency: 0.0107858243451464
Finished writing QC data to csv

QC data written to csv file.
QC file saved successfully

Elapsed time: 15.03 s

Processing: C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\raw-test-file\testingQc\ch_23Aug2018_HeLa_Std_2.raw

Determing MS analysis order... Done!
Extracting scan indices: 100%
Checking for orphaned scans: 100%

================ Scan indexing report ================
Total scans in file: 11525
Scans linked: 11525

Orphan scans:
None!

All scans accounted for!
======================================================

Extracting reaction events: 100%
Extracting scan data: 100%
Extracting trailer extras: 100%
Extracting precursor masses: 100%
Extracting retention times: 100%
Analyzing precursor peaks: 100%
Calculating meta data
  MS1 isolation interference
  MS2 scan cycle density
  Ion injection time
  FAIMS voltages
  Duty cycle
  Intensity distribution
  Summed intensities
Calculating metrics
Writing MGF file: 100%

X! TANDEM Alanine (2017.2.1.4)

Loading spectra| (mgf).. loaded.
Spectra matching criteria = 2967
Starting threads .|.|.|.|.|.|.|.|.|.|.|.|.|.|. started.
Computing models:
        testing 1 2 3 testing 1 2 3 testing 1 2 3
        waiting for 2|3|45|6|7|8|9|10|11|12|13|14|15| done.

        sequences modelled = 41 ks
Model refinement:

        waiting for 2|3|4|5|6|7|8|9|10|11|12|13|14|15| done.

Merging results:
        from 23456789101112131415

Creating report:
        initial calculations  ..... done.
        sorting  ..... done.
        finding repeats ..... done.
        evaluating results ..... done.
        calculating expectations ..... done.
        writing results ..... done.

Valid models = 2080

Sufficient PSMs and decoy hits detected for ID filtering.
Total number of unfiltered PSMs: 2959
Total number of decoy hits: 293
Total hits: 2959
Top decoy score: 30.05
Non-decoy hits: 2666
Non-decoy hits above top decoy score: 1997
Digestion efficiency: 0.72158237356034
Missed cleavage rate (/PSM): 0.302954431647471
IDrate: 0.665666666666667
15.9949@M modification frequency: 0.0258992805755396

Finished writing QC data to csv

QC data written to csv file.
QC file saved successfully

Elapsed time: 13.78 s

Time to process all 2 files: 00:00:28.8167614

C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtools207>

If I execute the same command in the GUI, it also works fine, so I am a bit at a loss here. When you run it from the command line, at what point during the execution does it fail? I can you provide me with the entire command line output (including your input command and any information RawTools spits out at all, perhaps running with and without a search).

ilham-rabbi commented 2 years ago

Hi,

As a note, I am also running windows, am using RawTools 2.0.7, and also actually only using a subset of the raw files I mentioned. When I run RawTools without the search the output looks like this:

C:\Users\Admin>RawTools -d D:\IAR\RawTool\ExampleRawFiles\ -qc D:\IAR\RawTool\ExampleRawFilesResults\
2 file(s) to process

Processing: D:\IAR\RawTool\ExampleRawFiles\ch_23Aug2018_HeLa_Std_6.raw

Determing MS analysis order... Done!
Extracting scan indices: 100%
Checking for orphaned scans: 100%

================ Scan indexing report ================
Total scans in file: 11435
Scans linked: 11435

Orphan scans:
None!

All scans accounted for!
======================================================

Extracting reaction events: 100%
Extracting scan data: 100%
Extracting trailer extras: 100%
Extracting precursor masses: 100%
Extracting retention times: 100%
Analyzing precursor peaks: 100%
Calculating meta data
  MS1 isolation interference
  MS2 scan cycle density
  Ion injection time
  FAIMS voltages
  Duty cycle
  Intensity distribution
  Summed intensities
Calculating metrics
Finished writing QC data to csv

QC data written to csv file.
QC file saved successfully

Elapsed time: 4.78 s

Processing: D:\IAR\RawTool\ExampleRawFiles\ch_23Aug2018_HeLa_Std_7.raw

Determing MS analysis order... Done!
Extracting scan indices: 100%
Checking for orphaned scans: 100%

================ Scan indexing report ================
Total scans in file: 11409
Scans linked: 11409

Orphan scans:
None!

All scans accounted for!
======================================================

Extracting reaction events: 100%
Extracting scan data: 100%
Extracting trailer extras: 100%
Extracting precursor masses: 100%
Extracting retention times: 100%
Analyzing precursor peaks: 100%
Calculating meta data
  MS1 isolation interference
  MS2 scan cycle density
  Ion injection time
  FAIMS voltages
  Duty cycle
  Intensity distribution
  Summed intensities
Calculating metrics

Finished writing QC data to csv

QC data written to csv file.
QC file saved successfully

Elapsed time: 4.54 s

Time to process all 2 files: 00:00:09.3110903

with this, the results are created in the directory I specified

When I run RawTools with the search, my output looks like this:

C:\Users\Admin>RawTools -d D:\IAR\RawTool\ExampleRawFiles\ -qc D:\IAR\RawTool\ExampleRawFilesResults2\ -db D:\IAR\RawTool\HumanDB.fasta -fmods 57.0214@C -vmods 15.9949@M -X D:\IAR\RawTool\xtandem\bin
2 file(s) to process

Processing: D:\IAR\RawTool\ExampleRawFiles\ch_23Aug2018_HeLa_Std_6.raw

Determing MS analysis order... Done!
Extracting scan indices: 100%
Checking for orphaned scans: 100%

================ Scan indexing report ================
Total scans in file: 11435
Scans linked: 11435

Orphan scans:
None!

All scans accounted for!
======================================================

Extracting reaction events: 100%
Extracting scan data: 100%
Extracting trailer extras: 100%
Extracting precursor masses: 100%
Extracting retention times: 100%
Analyzing precursor peaks: 100%
Calculating meta data
  MS1 isolation interference
  MS2 scan cycle density
  Ion injection time
  FAIMS voltages
  Duty cycle
  Intensity distribution
  Summed intensities
Calculating metrics
Writing MGF file: 100%
Reading FASTA file
Writing target-decoy database: 100%

Unhandled Exception: System.ComponentModel.Win32Exception: The system cannot find the file specified
   at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo)
   at RawTools.Utilities.ConsoleUtils.VoidBash(String cmd, String args) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\Utilities\Utilities.cs:line 109
   at RawTools.QC.Search.RunSearch(WorkflowParameters parameters, MethodDataContainer methodData, String rawFileName) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\QC\RunSearch.cs:line 63
   at RawTools.WorkFlows.WorkFlowsDDA.UniversalDDA(IRawFileThreadManager rawFileThreadManager, WorkflowParameters parameters, QcDataCollection qcDataCollection) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\WorkFlows\DDA.cs:line 188
   at RawTools.Program.Run(Dictionary`2 opts) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\Program.cs:line 243
   at RawTools.Program.Main(String[] args) in C:\Users\chughes\Documents\bccrc\softwareRepository\RawTools\rawtoolsDev\RawTools\Program.cs:line 73

Since the issue mentions not finding a file, I tried reinstalling it thinking that maybe i had done it wrong the first time. This didn't bear any fruit and i returned the exact same errors.

Thanks again

Ilham

chrishuges commented 2 years ago

If you check the QC directory, does it actually write the target decoy database?

ilham-rabbi commented 2 years ago

It produces a new database in the same directory as the original

image

chrishuges commented 2 years ago

It is a bit odd that your 'RawTools_custom_config.xml' file is in that directory and not in your xtandem/bin directory (or is this one just a copy?). Can you show me what your xtandem bin directory looks like? I assume the mgf is also written?

Would you be willing to try an older version, just for a sanity check (version 2.0.4 is a good one to try).

ilham-rabbi commented 2 years ago

I haven't made any copies of the "RawTools_custom_config.xml" file, so it should be the original.

here's an image of the xtandem bin:

image

The only mgf file i could find was in the results folder. It only wrote for one of my input files: image

I'll give 2.0.4 a shot and let you know how that goes as well.

ilham-rabbi commented 2 years ago

Just did it with version 2.0.4 and it worked! I've have to run a few more sets of data to be sure, but the test files i was using have been successfully processed

this is what the console spit out:

C:\Users\Admin>RawTools -d D:\IAR\RawTool\ExampleRawFiles\ -qc D:\IAR\RawTool\ExampleRawFilesResults2\ -db D:\IAR\RawTool\HumanDB.fasta -fmods 57.0214@C -vmods 15.9949@M -X D:\IAR\RawTool\xtandem\bin
2 file(s) to process

Processing: D:\IAR\RawTool\ExampleRawFiles\ch_23Aug2018_HeLa_Std_6.raw

Determing MS analysis order... Done!
Extracting scan indices: 100%
Checking for orphaned scans: 100%

================ Scan indexing report ================
Total scans in file: 11435
Scans linked: 11435

Orphan scans:
None!

All scans accounted for!
======================================================

Extracting reaction events: 100%
Extracting scan data: 100%
Extracting trailer extras: 100%
Extracting precursor masses: 100%
Extracting retention times: 100%
Analyzing precursor peaks: 100%
Calculating meta data
  MS1 isolation interference
  MS2 scan cycle density
  Ion injection time
  Duty cycle
  Intensity distribution
  Summed intensities
Calculating metrics
Writing MGF file: 100%
Reading FASTA file
Writing target-decoy database: 100%

X! TANDEM Alanine (2017.2.1.4)

Loading spectra| (mgf).. loaded.
Spectra matching criteria = 2971
Starting threads .|.|.|.|.|.|. started.
Computing models:
        testing 1 2 3 testing 1 2 3 testing 1 2 3 testing | 50 ks
         1 2 3 testing 1 2 3 testing 1 2 3 testing 1 2 3  | 100 ks
        testing 1 2 3 testing 1 2 3 testing 1 2 3 testing | 150 ks
         1 2 3 tes
        waiting for 2|3|4|5|6|7| done.

        sequences modelled = 159 ks
Model refinement:

        waiting for 2|3|4|5|6|7| done.

Merging results:
        from 234567

Creating report:
        initial calculations  ..... done.
        sorting  ..... done.
        finding repeats ..... done.
        evaluating results ..... done.
        calculating expectations ..... done.
        writing results ..... done.

Valid models = 2000

Total hits: 2966
Top decoy score: 29.95
Non-decoy hits: 2624
Non-decoy hits above top decoy score: 1948
Digestion efficiency: 0.719712525667351
Missed cleavage rate (/PSM): 0.316221765913758
IDrate: 0.649333333333333
15.9949@M modification frequency: 0.03486529318542
Finished writing QC data to csv

QC data written to csv file.
QC file saved successfully

Elapsed time: 27.09 s

Processing: D:\IAR\RawTool\ExampleRawFiles\ch_23Aug2018_HeLa_Std_7.raw

Determing MS analysis order... Done!
Extracting scan indices: 100%
Checking for orphaned scans: 100%

================ Scan indexing report ================
Total scans in file: 11409
Scans linked: 11409

Orphan scans:
None!

All scans accounted for!
======================================================

Extracting reaction events: 100%
Extracting scan data: 100%
Extracting trailer extras: 100%
Extracting precursor masses: 100%
Extracting retention times: 100%
Analyzing precursor peaks: 100%
Calculating meta data
  MS1 isolation interference
  MS2 scan cycle density
  Ion injection time
  Duty cycle
  Intensity distribution
  Summed intensities
Calculating metrics
Writing MGF file: 100%

X! TANDEM Alanine (2017.2.1.4)

Loading spectra| (mgf).. loaded.
Spectra matching criteria = 2973
Starting threads .|.|.|.|.|.|. started.
Computing models:
        testing 1 2 3 testing 1 2 3 testing 1 2 3 testing | 50 ks
         1 2 3 testing 1 2 3 testing 1 2 3 testing 1 2 3  | 100 ks
        testing 1 2 3 testing 1 2 3 testing 1 2 3 testing | 150 ks
         1 2 3 tes
        waiting for 2|3|4|5|6|7| done.

        sequences modelled = 159 ks
Model refinement:

        waiting for 2|3|4|5|6|7| done.

Merging results:
        from 234567

Creating report:
        initial calculations  ..... done.
        sorting  ..... done.
        finding repeats ..... done.
        evaluating results ..... done.
        calculating expectations ..... done.
        writing results ..... done.

Valid models = 2041

Total hits: 2967
Top decoy score: 30
Non-decoy hits: 2646
Non-decoy hits above top decoy score: 1938
Digestion efficiency: 0.729618163054696
Missed cleavage rate (/PSM): 0.297729618163055
IDrate: 0.646
15.9949@M modification frequency: 0.0219435736677116

Finished writing QC data to csv

QC data written to csv file.
QC file saved successfully

Elapsed time: 26.01 s

Time to process all 2 files: 00:00:53.1047914

I don't want to get my hopes up too early though. I'll reach out again if I need anymore help Thanks!

chrishuges commented 2 years ago

Hmm. I will have to try and track down what was happening in 2.0.7 at some point. It is odd though as nothing really changed in the QC pipeline between 2.0.4 and 2.0.7. I will have to think about it.