ogotoh / spaln

Genome mapping and spliced alignment of cDNA or amino acid sequences
GNU General Public License v2.0
96 stars 16 forks source link

Any ways to avoid or fix stop codon in spaln out put? #44

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago

Hi,

I found many stop codon in many gene in spaln out put, is this phenomenon normally? And how to avoid or fix if it is abnormal? command line: spaln -t6 -M4 -Q7 -O0 -LS -ya2 the peptide sequence are as following spaln.protein.pep.fa.gz

Best, Kun

ogotoh commented 2 years ago

Dear Kun,

Your command line does not show what is the genomic sequence (specified by -d option) and what are queries (specified by the argument). Without these lines of information, I cannot help you much. General suggestions are as follows. 1) Confirm that your genomic sequence is formatted accordingly. In particular, set -XG option if your sequence contains only partial genome. 2) Preferably, set proper -T option so that optimal species-specific parameter values should be used. 3) Some species use non-standard genetic code. In such a case, set proper -C option. 4) If you want to avoid excessive termination codon, set -yoN (e.g. N = 100) option, where N (default = 30) specifies the penalty to a premature termination codon. However, you must be careful, because this is usually accompanied with excessive gaps including frame shifts.

By the way, how did you obtained spaln.protein.pep.fa? Does each dot correspond to a termination codon? If you did not so, try -O7 option. The output is not a genuine FASTA format, but you may use ‘grep -v ‘^;’ to deplete additional lines.

Osamu,

xiekunwhy commented 2 years ago

Hi Osamu,

Here are all commands I used (Op-f.gf is the assembly I used, Opf.homolog.tab.best.faa is a subset protein sequences from orthodb10), and the species is Oplegnathus punctatus.

makeidx.pl -inp Op-f.gf spaln -t6 -M4 -Q7 -O0 -LS -ya2 -o Op-f.protein.gff3 -d Op-f Opf.homolog.tab.best.faa gffread Op-f.protein.gff3 -g Op-f.gf -x spaln.protein.cds.fa -y spaln.protein.pep.fa

I will try -yoN and -O7

The dots in protein file are correspond to a termination codon, for example, the 10th codon of mRNA13344

mRNA13344 gene=scaffold_1_667 ATGTGCAGCCAGGTGAGCCTGCTGCAGTGACGCTGTCTGTGTTGACTCCACAGATGCAGACGGTCACTCT GATTCCCGGGGACGGGATTGGACCAGAGATCTCCACTGCTGTCATGAAGATCTTTGAGGCTGCAAAGGTG AGTGTGATCCGTTTGTTTCTTCATCTTTGTGAGTATCTGTTTGAAAGTGTAGATTTCACCTGCAGGCTCC GATCAGCTGGGAGGAGAGGAATGTGACGGCCATAAAGGGACCCGGTGGCCGGTGGATGATCCCCCCTGAT GCTAAAGAGTCCATGGACAAGAGCAAGATCGGACTGAAAGGACCCCTGAAGACCCCCATCGCCGCAGGTC ACCCCTCCATGAACCTGCTGCTGAGGAAGACCTTTGACCTTTACGCCAACGTGCGACCCTGCGTCTCTAT CGAGGGCTACAAGACTCCGTACACCGACGTCAACCTGGTCACCATCCGCGAGAACACGGAGGGCGAGTAC AGCGGCATCGAACACGTGAGTCATTAGAGCCTCGTCCTGCTGCTGGAGCACAAACACCTGGAACGAGTCA CGTTATCGACCATCAGAAAGTCCAGCAGCTGTTTGTTAGTCCTGTCAGCTAGCGGCTGCAGACAGGACGC TCTGCTCCTGCTCGTCTTCAGGATCGTCGACGGCGTCGTTCAGAGCATCAAACTGATCACTGAGGACGCC

Best, Kun