jeffdaily / parasail

Pairwise Sequence Alignment Library
Other
243 stars 34 forks source link

`./parasail_aligner -f myseqs.fasta -a sw_stats_striped_16 -O EMBOSS` Not work #109

Open Huilin-Li opened 1 month ago

Huilin-Li commented 1 month ago

I want -a sw_scan_16 -e 1 -o 11, however, -O EMOBOSS also not work when -a sw_stats_striped_16 (defulat setting)

[lihuilin@login01 bin]$ ls
myseqs.fasta  mytrace  parasail1.csv  parasail_aligner  parasail.csv  parasail_stats  tb
[lihuilin@login01 bin]$ ./parasail_aligner -h

usage: parasail_aligner [-a funcname] [-c cutoff] [-x] [-e gap_extend] [-o gap_open] [-m matrix] [-t threads] [-d] [-M match] [-X mismatch] [-k band size (for nw_banded)] [-l AOL] [-s SIM] [-i OS] [-v] [-V] -f file [-q query_file] [-g output_file] [-O output_format {EMBOSS,SAM,SAMH,SSW}] [-b batch_size] [-r memory_budget] [-C] [-A alphabet_aliases]

Defaults:
        funcname: sw_stats_striped_16
          cutoff: 7, must be >= 1, exact match length cutoff
              -x: if present, don't use suffix array filter
      gap_extend: 1, must be >= 0
        gap_open: 10, must be >= 0
          matrix: blosum62
              -d: if present, assume DNA alphabet ACGT
           match: 1, must be >= 0
        mismatch: 0, must be >= 0
      threads: system-specific default, must be >= 1
             AOL: 80, must be 0 <= AOL <= 100, percent alignment length
             SIM: 40, must be 0 <= SIM <= 100, percent exact matches
              OS: 30, must be 0 <= OS <= 100, percent optimal score
                                              over self score
              -v: verbose output, report input parameters and timing
              -V: verbose memory output, report memory use
            file: no default, must be in FASTA format
      query_file: no default, must be in FASTA format
     output_file: parasail.csv
   output_format: no default, must be one of {EMBOSS,SAM,SAMH,SSW}
      batch_size: 0 (calculate based on memory budget),
                  how many alignments before writing output
   memory_budget: 2GB or half available from system query (100.970 GB)
              -C: if present, use case sensitive alignments, matrices, etc.
alphabet_aliases: traceback will treat these pairs of characters as matches,
                  for example, 'TU' for one pair, or multiple pairs as 'XYab'
[lihuilin@login01 bin]$ ./parasail_aligner -f myseqs.fasta -a sw_stats_striped_16 -O EMBOSS
The selected output format 'EMBOSS' requires an alignment function that returns a traceback.
[lihuilin@login01 bin]$

./parasail_aligner -f myseqs.fasta works well, and the output file is

0,1,348,332,678,340,325,142,209,326
jeffdaily commented 1 month ago

parasail was designed to be performant and as a fallout of that we have (too) many alignment routines. For example, if you only need a score it is wasteful to calculate a traceback. The alignment routine you need is sw_trace_scan_16, note the "trace" in the name.