smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 46 forks source link

Command line invocation help #250

Closed animesh closed 7 years ago

animesh commented 7 years ago

Currently the /? switch is giving following error:

MetaMorpheusCommandLine.exe /?

Unhandled Exception: System.TypeInitializationException: The type initializer fo
r 'EngineLayer.MyEngine' threw an exception. ---> System.TypeInitializationExcep
tion: The type initializer for 'Proteomics.Residue' threw an exception. ---> Sys
tem.Collections.Generic.KeyNotFoundException: The given key was not present in t
he dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Chemistry.ChemicalFormula.ParseFormula(String formula)
   at Proteomics.Residue..cctor()
   --- End of inner exception stack trace ---
   at Proteomics.Residue.TryGetResidue(Char letter, Residue& residue)
   at EngineLayer.MyEngine.<GetResidueInclusionExclusionSearchModes>d__42.MoveNe
xt() in C:\projects\metamorpheus\EngineLayer\MyEngine.cs:line 162
   at EngineLayer.MyEngine.<LoadSearchModesFromFile>d__41.MoveNext() in C:\proje
cts\metamorpheus\EngineLayer\MyEngine.cs:line 146
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at EngineLayer.MyEngine..cctor() in C:\projects\metamorpheus\EngineLayer\MyEn
gine.cs:line 40
   --- End of inner exception stack trace ---
   at EngineLayer.MyEngine.get_MetaMorpheusVersion()
   at MetaMorpheusCommandLine.Program.Main(String[] args) in C:\projects\metamor
pheus\CMD\Program.cs:line 20

wondering what is the usual way to run the program via command line?

stefanks commented 7 years ago

Hi, we are currently writing a reader that can read settings for each task, and also a command line input parser. We expect to finish this in the course of the next week. See the discussion at https://github.com/smith-chem-wisc/MetaMorpheus/issues/102 for updates. Meanwhile, if you can, please try the Windows GUI version, and let us know your thoughts. We'd be grateful for the feedback.

animesh commented 7 years ago

Yeah sure Stefan. Which one is the GUI executable? So far the only exe i saw was the command-line one...

{
  "$schema": "http://json.schemastore.org/sarif-1.0.0",
  "version": "1.0.0",
  "runs": [
    {
      "tool": {
        "name": "Microsoft (R) Visual C# Compiler",
        "version": "1.3.1.0",
        "fileVersion": "1.3.1.60616",
        "semanticVersion": "1.3.1",
        "language": "en-US"
      },
      "results": [
      ]
    }
  ]
}
trishorts commented 7 years ago

On the main github page click on "releases". Currently there are 31. You want the most recent version. The more recent release is 0.0.65. Click on and Download "MetaMorpheusGUI.zip". Unzip that folder on your local computer. Once that is unzipped, open the folder and double click "MeteMorpheusGUI.exe". That should have you up and going.

One thing that many new users fail to do is to download and install MSFileReader. On the main GitHub page under "System Requirements", click on "Thermo MSFileReader" and follow the instructions. You will have to create an account with Thermo to get it. That usually happens quite fast.

stefanks commented 7 years ago

animesh, the command line version is working. You are welcome to try the current release! You can follow the instructions here: https://github.com/smith-chem-wisc/MetaMorpheus

animesh commented 7 years ago

Thanks :) ran without error for the test example at least, though took quite long (c.f. comet)

Starting compactPeptideToProteinPeptideMatching count: 34559
Ending compactPeptideToProteinPeptideMatching count: 34559

Finished writing file: 2017-03-22-11-38-37\Task5SearchTask\results.txt
Finished task: String
Finished engine:
EverythingRunnerResults
Time to run: 02:47:02.5594655

awesome none the less :+1:

Regarding the command line arguments, in order to perform a typical search including deamidation of NQ (http://web.expasy.org/findmod/DEAM.html) for our data, from Thermo Elite (Hi-Lo measurement), if i start with

-t Task4GptmdExample.toml

would the following changes be enough:

ListOfModListsGptmd = ["Mods\m.txt", "Mods\metals.txt", "Mods\pt.txt"] -> DELETE ProductMassTolerance = "±0.0100 Absolute" -> "±0.5 Absolute" TaskType = "Gptmd" -> "Search" ZdotIons = false -> true CIons = false -> true (can and should we also add a and x? sequest allows to add weights thus asking)

-s raw file(s)?

-d any uniprot DB in xml..gz format?

Append v.txt with following

ID   Deamidated asparagine
TG   Asparagine.
PP   Anywhere.
CF   H-1 N-1 O1
MM   0.984016
MT   Variable
//
ID   Deamidated glutamine
TG   Glutamine.
PP   Anywhere.
CF   H-1 N-1 O1
MM   0.984016
MT   Variable
//

By the way, i also tried to compile and the source and run the created version:

F:\promec\Animesh\MetaMorpheus>CMD\bin\Debug\MetaMorpheusCommandLine.exe

but it throws an error:

Unhandled Exception: System.TypeInitializationException: The type initializer for 'EngineLayer.MyEngine' threw an exception. ---> System.TypeInitializationException: The type initializer for 'Proteomics.Residue' threw an exception. ---> System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Chemistry.ChemicalFormula.ParseFormula(String formula)
   at Proteomics.Residue..cctor()
   --- End of inner exception stack trace ---
   at Proteomics.Residue.TryGetResidue(Char letter, Residue& residue)
   at EngineLayer.MyEngine.<GetResidueInclusionExclusionSearchModes>d__42.MoveNext() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 162
   at EngineLayer.MyEngine.<LoadSearchModesFromFile>d__41.MoveNext() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 146   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)   at EngineLayer.MyEngine..cctor() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 40
   --- End of inner exception stack trace ---
   at EngineLayer.MyEngine.get_MetaMorpheusVersion()
   at MetaMorpheusCommandLine.Program.Main(String[] args) in F:\promec\Animesh\MetaMorpheus\CMD\Program.cs:line 25

The Visual Studio debugger points to:

Problem signature:
  Problem Event Name:   CLR20r3
  Problem Signature 01: MetaMorpheusCommandLine.exe
  Problem Signature 02: 1.0.0.0
  Problem Signature 03: 58d24caf
  Problem Signature 04: mscorlib
  Problem Signature 05: 4.6.1087.0
  Problem Signature 06: 583e5c1a
  Problem Signature 07: 3791
  Problem Signature 08: 1e
  Problem Signature 09: A4DH5WWIWWW1YJTMP0C0KV4ZWCALU4IN
  OS Version:   6.1.7601.2.1.0.18.10
  Locale ID:    1033
  Additional Information 1: 5fb5
  Additional Information 2: 5fb52475b7d5707da06b20e0977b8f49
  Additional Information 3: 6a4f
  Additional Information 4: 6a4f15ec7e305cbcf5fdcf55aa9f7b7a

Read our privacy statement online:
  http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\system32\en-US\erofflps.txt

Specifically towards the following part of code object:

System.TypeInitializationException occurred
  HResult=0x80131534
  Message=The type initializer for 'EngineLayer.MyEngine' threw an exception.
  Source=EngineLayer
  StackTrace:
   at EngineLayer.MyEngine.get_MetaMorpheusVersion() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 61
   at MetaMorpheusCommandLine.Program.Main(String[] args) in F:\promec\Animesh\MetaMorpheus\CMD\Program.cs:line 25

Inner Exception 1:
TypeInitializationException: The type initializer for 'Proteomics.Residue' threw an exception.

Inner Exception 2:
KeyNotFoundException: The given key was not present in the dictionary.

should commenting this out be next logical step to try?

stefanks commented 7 years ago

animesh, the sample run takes a while because of the calibration component. Calibration is an optional step, and it is not optimized for time. The searches are actually quite fast.

Regarding a regular search, as a starting point you should edit the Search poml file, not a Gptmd poml file. Then make the changes you would like. a and x ions are not an option yet.

The -s parameter takes in a list of spectra files in Thermo raw or mzML format, and the -db parameter takes in a list of protein databases in uniprot xml or fasta formats, which in turn could have been compressed into .gz

You are welcome to append the v.txt file, and run a search that would consider those modifications to be variable. I would be interested in hearing about the results! A more compact way of writing the modification is

ID   Deamidaton
TG   N or Q
PP   Anywhere.
CF   H-1 N-1 O1
MM   0.984016
MT   Variable
//

But we recommend an alternative, much more robust approach to deal with deamidations: G-PTM-D. First run the G-PTM-D task, which would add the possible deamidation locations to the database, and then a following Search task that would identify these modifications.

When you compile, what OS are you using? What compiler? Did you do a nuget package restore prior to compiling? Do you have the Data folder present with the files elements.dat, unimod.xml, ptmlist.txt?

Stefan

animesh commented 7 years ago

Thanks agains for prompt and helpful response Stefan :+1:

I tried a simple search which took a while too:

Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-22-15-00-28\Task1SearchTask\results.txt
Finished task: String
Finished engine:
EverythingRunnerResults
Time to run: 02:17:46.6473286

BTW how are the number of CPUs employed being calculated? I see that it is using all of it, probably a way to control number via command line will be cool :) also RAM usage seems quite high, quickly shot to 8 GB for this Elite raw file of about 134 MB, which i guess is quite high too? Comparison is relative to my experience restricted to tools such as Discoverer Deamon, MaxQuant and comet via command line.

Regarding G-PTM-D, does it work like preview? Is it possible to put two "TaskType " in the same file or they need to be provided as separate toml files?

I am running VS2017 over 64 bit Windows 2008 R2 Enterprise SP1. I have used git clone over a fork from your project and compilation seems to nuget and install dependencies, but i am very new to this environment, thus could you let me know the specifics on how to nuget the right deps and what files and directories need to be presented to the compiler to create the right binary? Should i just put Data folder with the files elements.dat, unimod.xml, ptmlist.txt in the project base?

Current directory structure tree.txt

stefanks commented 7 years ago

Is this the run with variable Deamidations? Could you send me a the database/spectra/poml settings file you are using? I can check where the performance bottleneck is.

The parallelization is done by the .NET runtime, and yes, it tries to use any resource available in order to speed up computation. Currently in Windows 10 you can restrict the cores available to a process by going to Task Manager -> Details -> Right click on MetaMorpheus -> Set affinity, and choose the cores available to the process. I mada an issue regarding this: https://github.com/smith-chem-wisc/MetaMorpheus/issues/279

The G-PTM-D task augments the database with plausible modifications, and the following search task looks for peptides with these localized modifications. The combination of the two should be much faster than running a search with these modifications set as variable. So yes, you need two separate toml files (but they could be part of a single MetaMorpheus invocation).

For compiling by yourself, you seem to have all the necessary files. I'll try to replicate your error.

Thanks!

stefanks commented 7 years ago

animesh, could you try deleting the Data/elements.dat file, and running the MetaMorpheusCommandLine.exe application from the Debug folder? Do you see the same error?

animesh commented 7 years ago

Thanks for creating the request for CPU restriction via command line :)

Regarding recompilation, deleting elements.dat seem to have actually worked :+1:

F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>MetaMorpheusCommandLine.exe
Element database did not exist, writing to disk
Not a release version
Usage:
        -t --tasks     List of task poml files
        -s --spectra   List of spectra files
        -e --databases List of database files

just need to correct typo -e switch to -d ?

I am running this version now with following invocation:

F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>MetaMorpheusCommandLine.exe -t gptm d.toml search.toml -s F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\20160823_QC-elite_BSA-2.raw -d F:\promec\MMC\uniprot-mouse-reviewed-3-9-2017.xml.gz 

essentially the toml are stripped down version of what i was using earlier with following diff of

gptmd - Copy.txt search - Copy.txt

diff search.toml  gptmd.toml
4c4,7
< TaskType = "Search"
---
> IsotopeErrors = false
> TaskType = "Gptmd"
> InitiatorMethionineBehavior = "Variable"
> MaxMissedCleavages = 2
8,9d10
< ZdotIons = true
< CIons = true

and it seems to be running. Do you think it is fine to invoke this way using these toml files? What are the default values by the way?

stefanks commented 7 years ago

Yup, that's a typo! Thanks for the catch.

The default values are here: https://github.com/smith-chem-wisc/MetaMorpheus/blob/master/TaskLayer/SearchTask/SearchTask.cs https://github.com/smith-chem-wisc/MetaMorpheus/blob/master/TaskLayer/GPTMDTask/GPTMDTask.cs https://github.com/smith-chem-wisc/MetaMorpheus/blob/master/TaskLayer/CalibrationTask/CalibrationTask.cs

animesh commented 7 years ago

Thanks Stefan, will check those defaults. Meanwhile the run with gptmd call finished and it was blazingly fast compared to earlier search with explicit variable mods

F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>MetaMorpheusCommandLine.exe -t gptmd.toml search.toml -s F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\20160823_QC-elite_BSA-2.raw -d F:\promec\FastaDB\crap_correct_hdr.fasta
Not a release version
Starting engine:EverythingRunnerEngine
Starting task:
Task1GptmdTask
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\prose.txt
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\GptmdTaskconfig.toml
Starting engine:ClassicSearchEngine
Status: In classic search engine!
Status: Getting ms2 scans...
Status: Starting classic search loop...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Finished engine:
ClassicSearchResults
Time to run: 00:00:06.0427874

Starting engine:AnalysisEngine
Status: Running analysis engine!
Status: Adding observed peptides to dictionary...
Status: Adding possible sources to peptide dictionary...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Status: Running FDR analysis...
Status: Running modification analysis...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\20160823_QC-elite_BSA-2.psmtsv
Status: Running histogram analysis...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\20160823_QC-elite_BSA-2.mytsv
Finished engine:
AnalysisResults
Time to run: 00:00:06.2429452
All PSMS within 1% FDR: 2
Search mode 0 Mods seen:

Search mode 0 Mods on proteins:

Starting compactPeptideToProteinPeptideMatching count: 0
Ending compactPeptideToProteinPeptideMatching count: 4880

Starting engine:GptmdEngine
Finished engine:
GptmdResults
Time to run: 00:00:00.0285731

Modifications added = 312
Proteins expanded = 29
Mods added types and counts:
Sulfonation of S 32
Labile Phosphorylation of S 32
Phosphorylation of S 32
Methylation of S 29
Methylation of D 21
Sodium 15
Deamidation of Q 13
Trioxidation of C 13
Methylation of K 12
Methylation of R 11
DiMethylation of K 11
Deamidation of N 10
Oxidation of F 10
Oxidation of P 10
Methylation of Q 8
Proline Oxidation to pyroglutamic acid 8
Methylation of H 6
DiMethylation of N 5
Methylation of N 5
DiMethylation of R 5
Lysine not cleaved 4
Oxidation of Y 3
Sulfonation of T 3
Labile Phosphorylation of T 3
Phosphorylation of T 3
Acetylation 2
Fe[III] 2
Oxidation of W to Kynurenine 1
Carbamidomethylation of K 1
Zinc 1
Water loss from D 1
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\crap_correct_hdrGPTMD.xml
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\results.txt
Finished task: String
Starting task:
Task2SearchTask
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\prose.txt
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\SearchTaskconfig.toml
Starting engine:ClassicSearchEngine
Status: In classic search engine!
Status: Getting ms2 scans...
Status: Starting classic search loop...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Finished engine:
ClassicSearchResults
Time to run: 00:00:04.1823784

Starting engine:AnalysisEngine
Status: Running analysis engine!
Status: Adding observed peptides to dictionary...
Status: Adding possible sources to peptide dictionary...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Status: Running FDR analysis...
Status: Running modification analysis...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\20160823_QC-elite_BSA-210ppmAroundZero.psmtsv
Status: Running FDR analysis on unique peptides...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\20160823_QC-elite_BSA-2uniquePeptides10ppmAroundZero.psmtsv
Finished engine:
AnalysisResults
Time to run: 00:00:06.5411660
All PSMS within 1% FDR: 0
Search mode 0 Mods seen:

Search mode 0 Mods on proteins:

Starting compactPeptideToProteinPeptideMatching count: 0
Ending compactPeptideToProteinPeptideMatching count: 3004

Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\results.txt
Finished task: String
Finished engine:
EverythingRunnerResults
Time to run: 00:02:02.9158713

Usage:
        -t --tasks     List of task poml files
        -s --spectra   List of spectra files
        -e --databases List of database files

F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>

which begs the question if it is good enough way to run searches? Specially a quick scan like Preview http://www.proteinmetrics.com/products/preview/ seems to provide? Or even c.f. Mass tolerant search approach of Chick et al http://www.nature.com/nbt/journal/v33/n7/abs/nbt.3267.html which in my experience is terribly slow and requires a lot of post processing of comet-ms results...

stefanks commented 7 years ago

animesh, it's a good way if it satisfies your needs! Please tell us if there are any enhancements/additions you would like to see.

animesh commented 7 years ago

Thanks Stefan for encouraging words and since YAFI, here is an attempt to see if the idea works. Essentially i re-run a raw file using this approach where i know a peptide "KAPAGQEEPGTPPSSPLSAEQLDR" one of the site is phosphorylated for example, detected via Mascot and Sequest in the msf file provided at https://goo.gl/aiJLRq . The approach mentioned above leads to find this with Sulfonation "KAPAGQEEPGT[:Sulfonation of T]PPSSPLSAEQLDR" instead. Could it be due to very similar mass difference?

I am sharing the link with the raw file as well, in case you want to reproduce :)

PS: BTW what is the reason to write two result files "10ppmAroundZero" and "niquePeptides10ppmAroundZero" and which one you recommend to check out? Further is there a way to extract enough data to run percolator downstream or the q-values provided by MetaMorpheus is close enough?

trishorts commented 7 years ago

Stay tuned. I'm concerned about the performance of MM on low-res data. MS2 for this was in the ion-trap, correct? The q-values in the results file I looked at are poor. I'll talk with stefan about this when he gets in.

Please also follow ISSUE#300 "Improve capabilities for low-res MS2 data, e.g. with probabilistic scoring"

animesh commented 7 years ago

thanks for the heads up, yes this is hi-lo elite instrument data

looking forward to the improvements :100: