smith-chem-wisc / Spritz

Software for RNA-Seq analysis to create sample-specific proteoform databases from RNA-Seq data
https://smith-chem-wisc.github.io/Spritz/
MIT License
7 stars 11 forks source link

Use ORF finding for coding sequence discovery #148

Closed acesnik closed 4 years ago

acesnik commented 5 years ago

This paper did a nice job of showing how stringtie, plus orf finding, revealed a lot of good hits to unannotated protein isoforms. https://pubs.acs.org/doi/10.1021/acs.jproteome.8b00295

Combining these issues:

There are a couple avenues for this one.

  1. You could use the software TransDecoder (https://github.com/TransDecoder/TransDecoder) with the TransDecoder.LongOrf tool.
  2. You could use the Proteogenomics library I built. The translation code is broken, though, so that'd take some testing and polishing. https://github.com/smith-chem-wisc/Proteogenomics
  3. Figure out which genomic regions those ORFs came from, and annotate them as CDS regions, kind of like here: https://github.com/smith-chem-wisc/Proteogenomics/blob/1e17d73e571c1f120866be9079977822f3c4855c/Proteogenomics/GeneModel.cs#L365
acesnik commented 4 years ago

https://github.com/smith-chem-wisc/Spritz/pull/187