torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
656 stars 122 forks source link

Alignment using vsearch, how to output the sequence of the best hit? #473

Closed cberthelier closed 2 years ago

cberthelier commented 2 years ago

Hello, I'm doing a global pairwise alignement using vsearch and I was wondering if it is possible to output the fasta sequence of the best hit ? I can get the id of the query and the target sequence (using --userfields "query+target") but I can't find any option to output the best hit sequence. Thank you for your help, Charlotte

frederic-mahe commented 2 years ago

hi @cberthelier

I suggest you try the --top_hits_only option to keep only the best hit for each entry, and the --alnout option to output alignments to a file (use --rowlen to control the alignment length). Alternatively, you can use the fields --userfields "qrow+trow" to get the aligned sequences.

frederic-mahe commented 2 years ago

as an example:

#+BEGIN_SRC sh
  vsearch \
      --usearch_global <(printf ">q\nAAATCG\n") \
      --db <(printf ">s1\nAAATGGA\n") \
      --quiet \
      --minseqlength 1 \
      --top_hits_only \
      --id 0.8 \
      --alnout - \
      --userfields "qrow+trow" \
      --userout -

The output of --alnout:

  Query >q
   %Id   TLen  Target
   83%      7  s1

   Query 6nt >q
  Target 7nt >s1

  Qry 1 + AAATCG 6
          |||| |
  Tgt 1 + AAATGG 6

  6 cols, 5 ids (83.3%), 0 gaps (0.0%)

The output of --userout:

  AAATCG    AAATGG
cberthelier commented 2 years ago

Hi @frederic-mahe , I tried the fields --userfields "qrow+trow" to get the aligned sequences and all worked out really well! Thanks for the help.