Transcript domains are used instead of entire transcripts

mptrsen / Orthograph

Orthology prediction using a graph-based, reciprocal approach with profile hidden Markov models

GNU General Public License v3.0

32 stars 5 forks source link

Transcript domains are used instead of entire transcripts #2

Closed mptrsen closed 11 years ago

mptrsen commented 12 years ago

Because many transcripts are longer than the genes in the ortholog groups. hmmsearch will use the --domtblout option and the output will be parsed so that only the relevant regions will be extracted from transcripts.

mptrsen commented 12 years ago

Do not restrict the HMM search to domains yet, as this may limit flexibility during the reciprocal search and the final ortholog assignment. Instead, take the full transcript and do the cropping after the assignment using Exonerate, generating corresponding nucleotide output on the way.

mptrsen commented 12 years ago

--domtblout option implemented, start and end of the alignment are saved for later parsing.

mptrsen commented 11 years ago

Done. Start and end point are parsed from the domtblout and only those regions are used for the reciprocal blast.