Open CuriousTim opened 3 years ago
Hi,
Thanks for your interest, and thanks for the reproducible bug report.
This issue appears to be caused by mixed-case input- orfm expects either all upper-case or all lower-case. So you can workaround by
cat input.fna |tr a A |tr c C |tr g G |tr t T |tr n N | ./orfm
I'll try to implement a proper fix soon. Thanks, ben
Thanks for the workaround. That solved the issue. I would recommend something like this
awk -F'>' 'NF > 1 NF == 1{print toupper($0)}' input.fna | ./orfm
to avoid modifying any definition lines.
It is my understanding that OrfM outputs continuous stretches of codons without a stop codon in the middle, but I am getting output with embedded stop codons, or what I believe are stop codons. I tried predicting ORFs in the human genome (GCF_000001405.39_GRCh38.p13) and I get many sequences like the ones below with an asterix in the middle. Does the asterix not mean a stop codon?
Thanks