mptrsen / Orthograph

Orthology prediction using a graph-based, reciprocal approach with profile hidden Markov models
GNU General Public License v3.0
32 stars 5 forks source link

Inputting amino acid sequences #32

Closed austinv11 closed 4 years ago

austinv11 commented 5 years ago

Is it possible to input amino acid sequences instead of transcriptome data? Or should I attempt to reverse translate my amino acid data?

mptrsen commented 5 years ago

This is currently not supported, but you should be able to work around it by fooling Orthograph into thinking it already translated the input:

  1. Make a copy of your input file, adding _prot to the end the filename before the extension. That is, a file named input.fa should be copied to input_prot.fa.
  2. Run Orthograph normally.

Note that the output will likely contain irregularities (number of sequences...) and the nucleotide sequence output will of course be wrong. I am also not sure if this works completely, so please try it and let me know if it works.

austinv11 commented 5 years ago

It appears to fail at analysis as taking a look at the analyzer log shows this at the end:

Fatal: Could not translate input file. Is this nucleotide data? 
mptrsen commented 4 years ago

I added the option input-is-amino-acid to allow this. Please try out the new release 0.7.1 (and sorry for the late response, I hope this is still relevant).