shenwei356 / unikmer

A versatile toolkit for k-mers with taxonomic information
https://bioinf.shenwei.me/unikmer
MIT License
75 stars 7 forks source link

Please document "locate" #16

Closed ms-gx closed 4 years ago

ms-gx commented 4 years ago

Please document usage of locate command. Thank you so much for this great tool!

shenwei356 commented 4 years ago

Full document is not available yet, but you can download the latest binary and check the help message.

unikmer v0.12.0

Usage (unikmer locate)

$ unikmer locate -h

Locate k-mers in genome

Attention:
  1. All files should have the 'canonical' flag.
  2. Output is BED6 format.
  3. When using experimental flag --circular, leading subsequence of k-1 bp
     is appending to end of sequence. End position of k-mers that crossing
     sequence end would be greater than sequence length.

Usage:
  unikmer locate [flags]

Flags:
      --circular            circular genome (experimental)
  -g, --genome strings      genomes in (gzipped) fasta file(s)
  -h, --help                help for locate
  -o, --out-prefix string   out file prefix ("-" for stdout) (default "-")

Usage (unikmer uniqs)

$ unikmer uniqs -h

Mapping k-mers back to genome and find unique subsequences

Attention:
  1. All files should have the 'canonical' flag.
  2. Default output is in BED3 format, with left-closed and right-open
         0-based interval.
  3. When using experimental flag --circular, leading subsequence of k-1 bp
     is appending to end of sequence. 
       1) End position of k-mers that crossing sequence end would be
          greater than sequence length.
       2) Longer subsequences are not further extended.

Usage:
  unikmer uniqs [flags]

Flags:
  -M, --allow-muliple-mapped-kmer         allow multiple mapped k-mers
      --circular                          circular genome (experimental)
  -g, --genome strings                    genomes in (gzipped) fasta file(s)
  -h, --help                              help for uniqs
  -x, --max-cont-non-uniq-kmers int       max continuous non-unique k-mers
  -X, --max-num-cont-non-uniq-kmers int   max number of continuous non-unique k-mers
  -m, --min-len int                       minimum length of subsequence (default 200)
  -o, --out-prefix string                 out file prefix ("-" for stdout) (default "-")
  -a, --output-fasta                      output fasta format instead of BED3
  -W, --seqs-in-a-file-as-one-genome      treat seqs in a genome file as one genome

Examples:

$ echo -ne ">s\nACTGCAATGC\n" | tee t.fa
>s
ACTGCAATGC

$ unikmer count -k 2 -K -s t.fa -o t.fa.k2

$ unikmer view t.fa.k2.unik 
AA
AC
AG
AT
CA
GC

$ unikmer locate --genome t.fa t.fa.k2.unik 
s       5       7       AA      0       .
s       0       2       AC      0       .
s       1       3       CT      0       .
s       6       8       AT      0       .
s       2       4       TG      0       .
s       4       6       CA      0       .
s       7       9       TG      0       .
s       3       5       GC      0       .
s       8       10      GC      0       

$ unikmer uniqs --genome t.fa t.fa.k2.unik --min-len 2
s       0       3
s       5       8

$ unikmer uniqs --genome t.fa t.fa.k2.unik --min-len 2 --output-fasta 
>s:1-3
ACT
>s:6-8
AAT

$ unikmer uniqs --genome t.fa t.fa.k2.unik --min-len 2 --output-fasta --allow-muliple-mapped-kmer
>s:1-10
ACTGCAATGC
shenwei356 commented 4 years ago

BTW, how to you use this tool? I mean what's the scenario?

ms-gx commented 4 years ago

Thanks, figured out in the meantime how to use :) We are using it for a commercial diagnostics application. If you want to know more, we can have a chat any time, just email me.

shenwei356 commented 4 years ago

info at genexa.ch?

ms-gx commented 4 years ago

Yes :)