mcfrith / last-genome-alignments

47 stars 5 forks source link

Result File Description #16

Closed luhan125 closed 1 year ago

luhan125 commented 1 year ago

Hi, I have a mutated gene sequence (length = 3984138bp). I want to align this sequence with wild type gene sequence (length=4021920bp), and look for SNP and Indel. The commands I used were: last-train -P0 --revsym --matsym --gapsym -E0.05 -C2 lastindex_baumanii Ab_01_tig00000001.fasta > Ab_01_tig00000001.mat lastal -m50 -E0.05 -C2 -p Ab_01_tig00000001.mat Ab_01_tig00000001.fasta | last-split -m1 > Ab_01_tig00000001-1.maf maf-swap Ab_01_tig00000001-1.maf | awk '/^s/ {$2 = (++s % 2 ? "Ab_01_tig00000001." : "baumanii.") $2} 1' | last-split -m1 | > Ab_01_tig00000001-2.maf last-postmask Ab_01_tig00000001-2.maf | maf-convert -n tab | awk -F'=' '$2 <= 1e-5' > Ab_01_tig00000001.tab

My questions are:

  1. Where can I find the explanation of these results files including Ab_01_tig00000001-1.maf, Ab_01_tig00000001-2.maf and Ab_01_tig00000001.tab?
  2. Did the commands I used can meet my destination?

Thank you very much for your reply!

best,

Lu

mcfrith commented 1 year ago

You can find an explanation of maf here: http://genome.ucsc.edu/FAQ/FAQformat.html#format5 Or in "understanding the output" here: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-cookbook.rst

TAB is explained here: https://gitlab.com/mcfrith/last/-/blob/main/doc/lastal.rst

Your lastal command needs a lastdb name, so I guess you used that actually. Apart from that, I think your commands are ok. (They are suited to older versions of LAST - the current recommendations are a bit different - but I think it's ok.)

As for finding SNPs and indels, you can see them by eye in the maf. I'm not sure of the best way to get them automatically, you could try maf-convert, or maybe: https://usegalaxy.org/u/dan/p/maf https://software.broadinstitute.org/software/igv/MultipleAlignmentFormat https://github.com/dentearl/mafTools https://jydu.github.io/maffilter/