smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 45 forks source link

Another crash - this time during FDR calculation #1744

Closed Dmorgen closed 4 years ago

Dmorgen commented 4 years ago

please see attached files. thanks!

Task2-GPTMDTaskconfig - Copy.toml.txt Task3-SearchTaskconfig - Copy.toml.txt Task1-CalibrateTaskconfig - Copy.toml.txt results.txt results.txt

trishorts commented 4 years ago

I'll make sure we are on top of this. Thanks for the report!

trishorts commented 4 years ago

I notice you were using single nucleotide substitutions. It's my experience that these can be very error prone.

Dmorgen commented 4 years ago

Yeah, I know – I manually validate such ID’s. thanks!

D.

From: trishorts notifications@github.com Sent: Wednesday, October 9, 2019 5:21 PM To: smith-chem-wisc/MetaMorpheus MetaMorpheus@noreply.github.com Cc: David Morgenstern david.morgenstern@weizmann.ac.il; Author author@noreply.github.com Subject: Re: [smith-chem-wisc/MetaMorpheus] Another crash - this time during FDR calculation (#1744)

I notice you were using single nucleotide substitutions. It's my experience that these can be very error prone.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/smith-chem-wisc/MetaMorpheus/issues/1744?email_source=notifications&email_token=AIQ5VIN7VYHXCMZCNRFH4YLQNXSE7A5CNFSM4I5YC4JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYBTYA#issuecomment-540023264, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIQ5VINEZ6IA5VDCIDW2AXTQNXSE7ANCNFSM4I5YC4JA.

trishorts commented 4 years ago

Related....If you have general interest in detecting all manner of sequence variants, we have a new tool that's is nearly ready to be sent for publication that creates sample specific proteogenomic databases from RNA seq. This database is written to be read by MM, of course, and can contain known PTMs (which is different from other proteogenomic databases). If you have RNA seq data for your sample, or you are using a cell line with RNA-seq data from GEO, I'd be happy to assist you with a test drive. This database is yielding superior results to our prior efforts in proteogenomics.

Dmorgen commented 4 years ago

That’s very interesting, thanks! We have a few such projects in the pipeline. Our biggest challenge now is working with orgnisms which don’t have any DB, and we’re using a DB from a near species/genus. This is me testing MM for the suitability of such analysis. ☺

Cheers, D.

From: trishorts notifications@github.com Sent: Wednesday, October 9, 2019 6:19 PM To: smith-chem-wisc/MetaMorpheus MetaMorpheus@noreply.github.com Cc: David Morgenstern david.morgenstern@weizmann.ac.il; Author author@noreply.github.com Subject: Re: [smith-chem-wisc/MetaMorpheus] Another crash - this time during FDR calculation (#1744)

Related....If you have general interest in detecting all manner of sequence variants, we have a new tool that's is nearly ready to be sent for publication that creates sample specific proteogenomic databases from RNA seq. This database is written to be read by MM, of course, and can contain known PTMs (which is different from other proteogenomic databases). If you have RNA seq data for your sample, or you are using a cell line with RNA-seq data from GEO, I'd be happy to assist you with a test drive. This database is yielding superior results to our prior efforts in proteogenomics.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/smith-chem-wisc/MetaMorpheus/issues/1744?email_source=notifications&email_token=AIQ5VIJQWMTKHGPNE4GR6WTQNXY4NA5CNFSM4I5YC4JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYH7FA#issuecomment-540049300, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIQ5VIK6X4H4G7WUDQZNXKDQNXY4NANCNFSM4I5YC4JA.

trishorts commented 4 years ago

One new feature, which you may not have noticed yet is that we added some post processing that computes PSM-level posterior error probabilities. This computation considers multiple factors beyond psm score (e.g. number of consecutive fragment ions, delta score, peptide retention time, etc.). It should work well for the types of peptides you are examining. In this case, look at the far right column of output (PEPQvalue) and consider those PSMs with QValue < 0.01. You'll find some PSM have traditional q-value below 0.01 do not have PEPQvalue below 0.01 and vice versa. I think those variants that make the PEP cut will be more worth your time w.r.t. validation.

Dmorgen commented 4 years ago

Awesome, thanks! D.

From: trishorts notifications@github.com Sent: Wednesday, October 9, 2019 6:48 PM To: smith-chem-wisc/MetaMorpheus MetaMorpheus@noreply.github.com Cc: David Morgenstern david.morgenstern@weizmann.ac.il; Author author@noreply.github.com Subject: Re: [smith-chem-wisc/MetaMorpheus] Another crash - this time during FDR calculation (#1744)

One new feature, which you may not have noticed yet is that we added some post processing that computes PSM-level posterior error probabilities. This computation considers multiple factors beyond psm score (e.g. number of consecutive fragment ions, delta score, peptide retention time, etc.). It should work well for the types of peptides you are examining. In this case, look at the far right column of output (PEPQvalue) and consider those PSMs with QValue < 0.01. You'll find some PSM have traditional q-value below 0.01 do not have PEPQvalue below 0.01 and vice versa. I think those variants that make the PEP cut will be more worth your time w.r.t. validation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/smith-chem-wisc/MetaMorpheus/issues/1744?email_source=notifications&email_token=AIQ5VIPOKD7KSL4JLFDCDITQNX4KDA5CNFSM4I5YC4JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYLGXQ#issuecomment-540062558, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIQ5VILZBNFD2X74F6QKOCTQNX4KDANCNFSM4I5YC4JA.

rmillikin commented 4 years ago

Apologies for the late reply. I believe this crash is the same as #1606 . The e-value calculation was removed in v0.0.302 in favor of a PEP (posterior error probability) calculation, so I think the crash should no longer occur with the new version. Can you run the search with the new MM and see if it works?

As a side note, @zrolfs recommended to me that you try the "Semi-Specific Search" instead of "Modern" because it should be faster. He can chime in if you need help setting that up.

zrolfs commented 4 years ago

Hello! Here's a quick wiki if you decide to try out the Semi-Specific Search: https://github.com/smith-chem-wisc/MetaMorpheus/wiki/Speedy-Semi--and-Non-Specific-Enzyme-Searches

As always, I'm happy to help answer any questions!

zrolfs commented 4 years ago

One last thing: If you do use the Semi-Specific Search, set your protease to "trypsin" instead of "semi-trypsin" (counter-intuitive). I'm not sure what the behavior will be otherwise.

Dmorgen commented 4 years ago

Thanks to both of you! one question - do I need to match the same parameters in the GPTMD search? does it matter at all?

D.

zrolfs commented 4 years ago

The searches operate independently, so you don't need to match the same parameters. I would just reuse the GPTMD database for the search, rather than redoing the GPTMD task.

rmillikin commented 4 years ago

I'm going to close this because I suspect this crash has been fixed, but feel free to reopen this issue or open a new one if you encounter this problem again.