sheynkman-lab / Long-Read-Proteogenomics

A workflow for enhanced protein isoform detection through integration of long-read RNA-seq and mass spectrometry-based proteomics.
MIT License
38 stars 16 forks source link

six-frame translation module path through pipeline #11

Closed trishorts closed 3 years ago

trishorts commented 3 years ago

seems like we could do a six frame translation to fasta and send it through metamorpheus for peptide identification. currently its connected to peptide analysis in the graphic. But if we don't search that db for peptides in MM, then we won't ever see them.

rmmiller22 commented 3 years ago

The 6 frame translation does result in a fasta. We could perform a search using it, but the real purpose of that database is to be able to look for peptide sequences that were only identified in the GENCODE or UniProt search. If these sequences are present in the 6-frame translation database, then we know that the long-reads captured this transcript space, and then either it did not meet our filtering criteria for the refined database due to insufficient coverage (which could be changed if more sequencing data was aquired) or the ORF calling was not perfect. Either way very informative and can help shape our scripts and analysis.

rmmiller22 commented 3 years ago

It is a little bit difficult to explain and I would be happy to video chat about this topic if needed

trishorts commented 3 years ago

there could be unknown translation frames that result in real peptides not already in genecode/uniprot. But i do sorta understand what you mean.

rmmiller22 commented 3 years ago

Yeah that is definitely possible. Performing the search would not hurt anything and could provide additional information. I would be okay with modifying the plan to include a search of the 6 frame database. It is quite a large database, so FDR would not necessarily be comparable.

adeslatt commented 3 years ago

A sample specific database is not going to be that large @bj8th are you working this? I have both transdecoder and cpat and we just cooked up an idea to have both run through nextflow and we choose a consensus.

gsheynkman commented 3 years ago

@adeslatt I'm closing this issue, but I've noted this idea of choosing a consensus between transdecoder and cpat. I would love to do that if we can pull it off! I'm going to keep this idea in another issues that lists open ideas.