Closed trishorts closed 3 years ago
The 6 frame translation does result in a fasta. We could perform a search using it, but the real purpose of that database is to be able to look for peptide sequences that were only identified in the GENCODE or UniProt search. If these sequences are present in the 6-frame translation database, then we know that the long-reads captured this transcript space, and then either it did not meet our filtering criteria for the refined database due to insufficient coverage (which could be changed if more sequencing data was aquired) or the ORF calling was not perfect. Either way very informative and can help shape our scripts and analysis.
It is a little bit difficult to explain and I would be happy to video chat about this topic if needed
there could be unknown translation frames that result in real peptides not already in genecode/uniprot. But i do sorta understand what you mean.
Yeah that is definitely possible. Performing the search would not hurt anything and could provide additional information. I would be okay with modifying the plan to include a search of the 6 frame database. It is quite a large database, so FDR would not necessarily be comparable.
A sample specific database is not going to be that large @bj8th are you working this? I have both transdecoder and cpat and we just cooked up an idea to have both run through nextflow and we choose a consensus.
@adeslatt I'm closing this issue, but I've noted this idea of choosing a consensus between transdecoder and cpat. I would love to do that if we can pull it off! I'm going to keep this idea in another issues that lists open ideas.
seems like we could do a six frame translation to fasta and send it through metamorpheus for peptide identification. currently its connected to peptide analysis in the graphic. But if we don't search that db for peptides in MM, then we won't ever see them.