Closed animesh closed 7 years ago
Hi, we are currently writing a reader that can read settings for each task, and also a command line input parser. We expect to finish this in the course of the next week. See the discussion at https://github.com/smith-chem-wisc/MetaMorpheus/issues/102 for updates. Meanwhile, if you can, please try the Windows GUI version, and let us know your thoughts. We'd be grateful for the feedback.
Yeah sure Stefan. Which one is the GUI executable? So far the only exe i saw was the command-line one...
{
"$schema": "http://json.schemastore.org/sarif-1.0.0",
"version": "1.0.0",
"runs": [
{
"tool": {
"name": "Microsoft (R) Visual C# Compiler",
"version": "1.3.1.0",
"fileVersion": "1.3.1.60616",
"semanticVersion": "1.3.1",
"language": "en-US"
},
"results": [
]
}
]
}
On the main github page click on "releases". Currently there are 31. You want the most recent version. The more recent release is 0.0.65. Click on and Download "MetaMorpheusGUI.zip". Unzip that folder on your local computer. Once that is unzipped, open the folder and double click "MeteMorpheusGUI.exe". That should have you up and going.
One thing that many new users fail to do is to download and install MSFileReader. On the main GitHub page under "System Requirements", click on "Thermo MSFileReader" and follow the instructions. You will have to create an account with Thermo to get it. That usually happens quite fast.
animesh, the command line version is working. You are welcome to try the current release! You can follow the instructions here: https://github.com/smith-chem-wisc/MetaMorpheus
Thanks :) ran without error for the test example at least, though took quite long (c.f. comet)
Starting compactPeptideToProteinPeptideMatching count: 34559
Ending compactPeptideToProteinPeptideMatching count: 34559
Finished writing file: 2017-03-22-11-38-37\Task5SearchTask\results.txt
Finished task: String
Finished engine:
EverythingRunnerResults
Time to run: 02:47:02.5594655
awesome none the less :+1:
Regarding the command line arguments, in order to perform a typical search including deamidation of NQ (http://web.expasy.org/findmod/DEAM.html) for our data, from Thermo Elite (Hi-Lo measurement), if i start with
-t Task4GptmdExample.toml
would the following changes be enough:
ListOfModListsGptmd = ["Mods\m.txt", "Mods\metals.txt", "Mods\pt.txt"] -> DELETE ProductMassTolerance = "±0.0100 Absolute" -> "±0.5 Absolute" TaskType = "Gptmd" -> "Search" ZdotIons = false -> true CIons = false -> true (can and should we also add a and x? sequest allows to add weights thus asking)
-s raw file(s)?
-d any uniprot DB in xml..gz format?
Append v.txt with following
ID Deamidated asparagine
TG Asparagine.
PP Anywhere.
CF H-1 N-1 O1
MM 0.984016
MT Variable
//
ID Deamidated glutamine
TG Glutamine.
PP Anywhere.
CF H-1 N-1 O1
MM 0.984016
MT Variable
//
By the way, i also tried to compile and the source and run the created version:
F:\promec\Animesh\MetaMorpheus>CMD\bin\Debug\MetaMorpheusCommandLine.exe
but it throws an error:
Unhandled Exception: System.TypeInitializationException: The type initializer for 'EngineLayer.MyEngine' threw an exception. ---> System.TypeInitializationException: The type initializer for 'Proteomics.Residue' threw an exception. ---> System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
at Chemistry.ChemicalFormula.ParseFormula(String formula)
at Proteomics.Residue..cctor()
--- End of inner exception stack trace ---
at Proteomics.Residue.TryGetResidue(Char letter, Residue& residue)
at EngineLayer.MyEngine.<GetResidueInclusionExclusionSearchModes>d__42.MoveNext() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 162
at EngineLayer.MyEngine.<LoadSearchModesFromFile>d__41.MoveNext() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 146 at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source) at EngineLayer.MyEngine..cctor() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 40
--- End of inner exception stack trace ---
at EngineLayer.MyEngine.get_MetaMorpheusVersion()
at MetaMorpheusCommandLine.Program.Main(String[] args) in F:\promec\Animesh\MetaMorpheus\CMD\Program.cs:line 25
The Visual Studio debugger points to:
Problem signature:
Problem Event Name: CLR20r3
Problem Signature 01: MetaMorpheusCommandLine.exe
Problem Signature 02: 1.0.0.0
Problem Signature 03: 58d24caf
Problem Signature 04: mscorlib
Problem Signature 05: 4.6.1087.0
Problem Signature 06: 583e5c1a
Problem Signature 07: 3791
Problem Signature 08: 1e
Problem Signature 09: A4DH5WWIWWW1YJTMP0C0KV4ZWCALU4IN
OS Version: 6.1.7601.2.1.0.18.10
Locale ID: 1033
Additional Information 1: 5fb5
Additional Information 2: 5fb52475b7d5707da06b20e0977b8f49
Additional Information 3: 6a4f
Additional Information 4: 6a4f15ec7e305cbcf5fdcf55aa9f7b7a
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
Specifically towards the following part of code object:
System.TypeInitializationException occurred
HResult=0x80131534
Message=The type initializer for 'EngineLayer.MyEngine' threw an exception.
Source=EngineLayer
StackTrace:
at EngineLayer.MyEngine.get_MetaMorpheusVersion() in F:\promec\Animesh\MetaMorpheus\EngineLayer\MyEngine.cs:line 61
at MetaMorpheusCommandLine.Program.Main(String[] args) in F:\promec\Animesh\MetaMorpheus\CMD\Program.cs:line 25
Inner Exception 1:
TypeInitializationException: The type initializer for 'Proteomics.Residue' threw an exception.
Inner Exception 2:
KeyNotFoundException: The given key was not present in the dictionary.
should commenting this out be next logical step to try?
animesh, the sample run takes a while because of the calibration component. Calibration is an optional step, and it is not optimized for time. The searches are actually quite fast.
Regarding a regular search, as a starting point you should edit the Search poml file, not a Gptmd poml file. Then make the changes you would like. a and x ions are not an option yet.
The -s parameter takes in a list of spectra files in Thermo raw or mzML format, and the -db parameter takes in a list of protein databases in uniprot xml or fasta formats, which in turn could have been compressed into .gz
You are welcome to append the v.txt file, and run a search that would consider those modifications to be variable. I would be interested in hearing about the results! A more compact way of writing the modification is
ID Deamidaton
TG N or Q
PP Anywhere.
CF H-1 N-1 O1
MM 0.984016
MT Variable
//
But we recommend an alternative, much more robust approach to deal with deamidations: G-PTM-D. First run the G-PTM-D task, which would add the possible deamidation locations to the database, and then a following Search task that would identify these modifications.
When you compile, what OS are you using? What compiler? Did you do a nuget package restore prior to compiling? Do you have the Data folder present with the files elements.dat, unimod.xml, ptmlist.txt?
Stefan
Thanks agains for prompt and helpful response Stefan :+1:
I tried a simple search which took a while too:
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-22-15-00-28\Task1SearchTask\results.txt
Finished task: String
Finished engine:
EverythingRunnerResults
Time to run: 02:17:46.6473286
BTW how are the number of CPUs employed being calculated? I see that it is using all of it, probably a way to control number via command line will be cool :) also RAM usage seems quite high, quickly shot to 8 GB for this Elite raw file of about 134 MB, which i guess is quite high too? Comparison is relative to my experience restricted to tools such as Discoverer Deamon, MaxQuant and comet via command line.
Regarding G-PTM-D, does it work like preview? Is it possible to put two "TaskType " in the same file or they need to be provided as separate toml files?
I am running VS2017 over 64 bit Windows 2008 R2 Enterprise SP1. I have used git clone over a fork from your project and compilation seems to nuget and install dependencies, but i am very new to this environment, thus could you let me know the specifics on how to nuget the right deps and what files and directories need to be presented to the compiler to create the right binary? Should i just put Data folder with the files elements.dat, unimod.xml, ptmlist.txt in the project base?
Current directory structure tree.txt
Is this the run with variable Deamidations? Could you send me a the database/spectra/poml settings file you are using? I can check where the performance bottleneck is.
The parallelization is done by the .NET runtime, and yes, it tries to use any resource available in order to speed up computation. Currently in Windows 10 you can restrict the cores available to a process by going to Task Manager -> Details -> Right click on MetaMorpheus -> Set affinity, and choose the cores available to the process. I mada an issue regarding this: https://github.com/smith-chem-wisc/MetaMorpheus/issues/279
The G-PTM-D task augments the database with plausible modifications, and the following search task looks for peptides with these localized modifications. The combination of the two should be much faster than running a search with these modifications set as variable. So yes, you need two separate toml files (but they could be part of a single MetaMorpheus invocation).
For compiling by yourself, you seem to have all the necessary files. I'll try to replicate your error.
Thanks!
animesh, could you try deleting the Data/elements.dat file, and running the MetaMorpheusCommandLine.exe application from the Debug folder? Do you see the same error?
Thanks for creating the request for CPU restriction via command line :)
Regarding recompilation, deleting elements.dat seem to have actually worked :+1:
F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>MetaMorpheusCommandLine.exe
Element database did not exist, writing to disk
Not a release version
Usage:
-t --tasks List of task poml files
-s --spectra List of spectra files
-e --databases List of database files
just need to correct typo -e switch to -d ?
I am running this version now with following invocation:
F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>MetaMorpheusCommandLine.exe -t gptm d.toml search.toml -s F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\20160823_QC-elite_BSA-2.raw -d F:\promec\MMC\uniprot-mouse-reviewed-3-9-2017.xml.gz
essentially the toml are stripped down version of what i was using earlier with following diff of
gptmd - Copy.txt search - Copy.txt
diff search.toml gptmd.toml
4c4,7
< TaskType = "Search"
---
> IsotopeErrors = false
> TaskType = "Gptmd"
> InitiatorMethionineBehavior = "Variable"
> MaxMissedCleavages = 2
8,9d10
< ZdotIons = true
< CIons = true
and it seems to be running. Do you think it is fine to invoke this way using these toml files? What are the default values by the way?
Yup, that's a typo! Thanks for the catch.
The default values are here: https://github.com/smith-chem-wisc/MetaMorpheus/blob/master/TaskLayer/SearchTask/SearchTask.cs https://github.com/smith-chem-wisc/MetaMorpheus/blob/master/TaskLayer/GPTMDTask/GPTMDTask.cs https://github.com/smith-chem-wisc/MetaMorpheus/blob/master/TaskLayer/CalibrationTask/CalibrationTask.cs
Thanks Stefan, will check those defaults. Meanwhile the run with gptmd call finished and it was blazingly fast compared to earlier search with explicit variable mods
F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>MetaMorpheusCommandLine.exe -t gptmd.toml search.toml -s F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\20160823_QC-elite_BSA-2.raw -d F:\promec\FastaDB\crap_correct_hdr.fasta
Not a release version
Starting engine:EverythingRunnerEngine
Starting task:
Task1GptmdTask
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\prose.txt
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\GptmdTaskconfig.toml
Starting engine:ClassicSearchEngine
Status: In classic search engine!
Status: Getting ms2 scans...
Status: Starting classic search loop...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Finished engine:
ClassicSearchResults
Time to run: 00:00:06.0427874
Starting engine:AnalysisEngine
Status: Running analysis engine!
Status: Adding observed peptides to dictionary...
Status: Adding possible sources to peptide dictionary...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Status: Running FDR analysis...
Status: Running modification analysis...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\20160823_QC-elite_BSA-2.psmtsv
Status: Running histogram analysis...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\20160823_QC-elite_BSA-2.mytsv
Finished engine:
AnalysisResults
Time to run: 00:00:06.2429452
All PSMS within 1% FDR: 2
Search mode 0 Mods seen:
Search mode 0 Mods on proteins:
Starting compactPeptideToProteinPeptideMatching count: 0
Ending compactPeptideToProteinPeptideMatching count: 4880
Starting engine:GptmdEngine
Finished engine:
GptmdResults
Time to run: 00:00:00.0285731
Modifications added = 312
Proteins expanded = 29
Mods added types and counts:
Sulfonation of S 32
Labile Phosphorylation of S 32
Phosphorylation of S 32
Methylation of S 29
Methylation of D 21
Sodium 15
Deamidation of Q 13
Trioxidation of C 13
Methylation of K 12
Methylation of R 11
DiMethylation of K 11
Deamidation of N 10
Oxidation of F 10
Oxidation of P 10
Methylation of Q 8
Proline Oxidation to pyroglutamic acid 8
Methylation of H 6
DiMethylation of N 5
Methylation of N 5
DiMethylation of R 5
Lysine not cleaved 4
Oxidation of Y 3
Sulfonation of T 3
Labile Phosphorylation of T 3
Phosphorylation of T 3
Acetylation 2
Fe[III] 2
Oxidation of W to Kynurenine 1
Carbamidomethylation of K 1
Zinc 1
Water loss from D 1
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\crap_correct_hdrGPTMD.xml
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task1GptmdTask\results.txt
Finished task: String
Starting task:
Task2SearchTask
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\prose.txt
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\SearchTaskconfig.toml
Starting engine:ClassicSearchEngine
Status: In classic search engine!
Status: Getting ms2 scans...
Status: Starting classic search loop...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Finished engine:
ClassicSearchResults
Time to run: 00:00:04.1823784
Starting engine:AnalysisEngine
Status: Running analysis engine!
Status: Adding observed peptides to dictionary...
Status: Adding possible sources to peptide dictionary...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Status: Running FDR analysis...
Status: Running modification analysis...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\20160823_QC-elite_BSA-210ppmAroundZero.psmtsv
Status: Running FDR analysis on unique peptides...
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\20160823_QC-elite_BSA-2uniquePeptides10ppmAroundZero.psmtsv
Finished engine:
AnalysisResults
Time to run: 00:00:06.5411660
All PSMS within 1% FDR: 0
Search mode 0 Mods seen:
Search mode 0 Mods on proteins:
Starting compactPeptideToProteinPeptideMatching count: 0
Ending compactPeptideToProteinPeptideMatching count: 3004
Finished writing file: F:\promec\DiscovererDaemon\DiscovererDaemon\SpectrumFiles\BSA_20160824\2017-03-23-11-23-30\Task2SearchTask\results.txt
Finished task: String
Finished engine:
EverythingRunnerResults
Time to run: 00:02:02.9158713
Usage:
-t --tasks List of task poml files
-s --spectra List of spectra files
-e --databases List of database files
F:\promec\Animesh\MetaMorpheus\CMD\bin\Debug>
which begs the question if it is good enough way to run searches? Specially a quick scan like Preview http://www.proteinmetrics.com/products/preview/ seems to provide? Or even c.f. Mass tolerant search approach of Chick et al http://www.nature.com/nbt/journal/v33/n7/abs/nbt.3267.html which in my experience is terribly slow and requires a lot of post processing of comet-ms results...
animesh, it's a good way if it satisfies your needs! Please tell us if there are any enhancements/additions you would like to see.
Thanks Stefan for encouraging words and since YAFI, here is an attempt to see if the idea works. Essentially i re-run a raw file using this approach where i know a peptide "KAPAGQEEPGTPPSSPLSAEQLDR" one of the site is phosphorylated for example, detected via Mascot and Sequest in the msf file provided at https://goo.gl/aiJLRq . The approach mentioned above leads to find this with Sulfonation "KAPAGQEEPGT[:Sulfonation of T]PPSSPLSAEQLDR" instead. Could it be due to very similar mass difference?
I am sharing the link with the raw file as well, in case you want to reproduce :)
PS: BTW what is the reason to write two result files "10ppmAroundZero" and "niquePeptides10ppmAroundZero" and which one you recommend to check out? Further is there a way to extract enough data to run percolator downstream or the q-values provided by MetaMorpheus is close enough?
Stay tuned. I'm concerned about the performance of MM on low-res data. MS2 for this was in the ion-trap, correct? The q-values in the results file I looked at are poor. I'll talk with stefan about this when he gets in.
Please also follow ISSUE#300 "Improve capabilities for low-res MS2 data, e.g. with probabilistic scoring"
thanks for the heads up, yes this is hi-lo elite instrument data
looking forward to the improvements :100:
Currently the /? switch is giving following error:
wondering what is the usual way to run the program via command line?