nathanweeks / exonerate

A fork of exonerate: a generic tool for sequence alignment
GNU General Public License v3.0
61 stars 25 forks source link

--bestn option no output #14

Closed norazrin closed 3 years ago

norazrin commented 6 years ago

Hi,

i am trying to run exonerate with --bestn 1 option, but there is no output produced outof it compared to if i did not put any of the --bestn option, why is it happening?

my command line is here /path/exonerate --model est2genome -q /path//Trans.fasta -t /path/genome.fa --bestn 1 --showtargetgff yes --score 1000 > exo_filt_532_a.out

please point to me where is the fault?

thanks

nathanweeks commented 6 years ago

Hi @norazrin , I'm sorry for the delay in getting to your question! Are you able to provide query and target sequence that I can use to reproduce the issue?

Also, what version of exonerate are you using (exonerate --version)?

norazrin commented 6 years ago

Hi Nathan,

i solved it by cutting the database into small chucks. However i did not understand about the output though.

what is (revcomp) that has scaffold ? where can i read more about how to process the output?

thanks azrin

Norazrin Ariffin, PhD Student School of Biological Science, Faculty of Biology, Medicine and Health, The University of Manchester, B1071 Michael Smith Building, Manchester M13 9PT


From: Nathan T. Weeks [notifications@github.com] Sent: Tuesday, March 13, 2018 3:33 PM To: nathanweeks/exonerate Cc: Norazrin Ariffin; Mention Subject: Re: [nathanweeks/exonerate] --bestn option no output (#14)

Hi @norazrinhttps://github.com/norazrin , I'm sorry for the delay in getting to your question! Are you able to provide query and target sequence that I can use to reproduce the issue?

Also, what version of exonerate are you using (exonerate --version)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nathanweeks/exonerate/issues/14#issuecomment-372707881, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQrztEIpsVoozC9HKuu2Z-IlACoC6HsHks5td-a-gaJpZM4SFsmW.

nathanweeks commented 6 years ago

Hi Norazrin,

If --bestn 1 doesn't produce output when the entire target genome.fa is used, but omitting --bestn 1 produces output, that would seem to indicate a bug in that version of exonerate. Submitting a --bestn 1 query against separate chunks of genome.fa shouldn't be necessary (and requires additional post-processing to get the equivalent of a global --bestn 1).

The (revcomp) indicates the query aligned to the reverse complement of the target (i.e., the other strand).

Regarding how to process the output: it depends on what you want to do with it. exonerate unfortunately doesn't support standard GFF3 output you can simulate it with the --ryo option and some post-processing; e.g.:

exonerate \
    --query query.fa \
    --querytype dna \
    --target target.fa \
    --verbose 0 --showalignment 0 \
    --refine region \
    --showvulgar 0 \
    --targettype dna \
    --model est2genome \
    --ryo '%ti\t.\tEST_match\t%tab\t%tae\t%s\t%tS\t.\tName=%qi;Gap=%C;identity=%pI;coverage=%pc;\t%qi\t%qab\t%qae\t%g\n\n' |
    awk -F '\t' -v OFS='\t' '
    $3 == "EST_match" {
        # For "-" strand alignments, exonerate reports the start > end
        if ($4 >= $5) {
            start = $5+1
            end   = $4 # see exonerate(1) for explanation of coordinates
        } else {
            start = $4+1 # see exonerate(1) for explanation of coordinates
            end   = $5
        }
        print $1,$2,$3,start,end,$6,$7,$8,$9 "Target=" $10 " " $11+1 " "\
              $12 ($13 == "." ? "" : " " $13)
    }' |
      sed -e '/EST_match/s/\( [MID]\) \([[:digit:]]\{1,\}\)/\1\2/g' \
          -e '/EST_match/s/Gap= /Gap=/'

Results in output like:

scaffold_revcomp    .   EST_match   1   4015    20075   -   .   Name=AF01595;Gap=M4015;identity=100.00;coverage=100.00;Target=AF01595 1 4015
scaffold_revcomp    .   EST_match   62  853 386 -   .   Name=AF01595;Gap=M3 D1 M11 I1 M3 I1 M1 I1 M9 I4 M3 I1 M2 I3 M10 D1 M36 D4 M8 D1 M7 D95 M30 D1 M14 D2 M14 I1 M8 I1 M11 I1 M6 I4 M5 I1 M19 D1 M13 I2 M10 D2 M18 I1 M9 D37 M18 D1 M3 I3 M39 I4 M19 D4 M11 I1 M12 I3 M10 I1 M7 D1 M16 D3 M21 I1 M1 I1 M32 D1 M8 I1 M13 D1 M4 D4 M5 D1 M3 D2 M13 D1 M23 D1 M22 D2 M1 D1 M16 I6 M2 I2 M21 I2 M8 D1 M4 I2 M11 I5 M23 I2 M7;identity=57.40;coverage=15.52;Target=AF01595 3303 3981 -
scaffold_revcomp    .   EST_match   345 437 179 -   .   Name=AF01595;Gap=M14 I1 M3 D3 M39 D4 M19 I4 M11;identity=68.37;coverage=2.14;Target=AF01595 3562 3652
scaffold_revcomp    .   EST_match   1850    1941    123 +   .   Name=AF01595;Gap=M5 D1 M27 D2 M8 I1 M11 D1 M4 I3 M8 I1 M5 D1 M19;identity=64.95;coverage=2.17;Target=AF01595 810 901
scaffold_revcomp    .   EST_match   3115    3206    123 +   .   Name=AF01595;Gap=M19 I1 M5 D1 M8 D3 M2 I1 M13 D1 M7 I2 M28 I1 M5;identity=64.95;coverage=2.17;Target=AF01595 2075 2166