Closed ms-gx closed 4 years ago
Full document is not available yet, but you can download the latest binary and check the help message.
Usage (unikmer locate)
$ unikmer locate -h
Locate k-mers in genome
Attention:
1. All files should have the 'canonical' flag.
2. Output is BED6 format.
3. When using experimental flag --circular, leading subsequence of k-1 bp
is appending to end of sequence. End position of k-mers that crossing
sequence end would be greater than sequence length.
Usage:
unikmer locate [flags]
Flags:
--circular circular genome (experimental)
-g, --genome strings genomes in (gzipped) fasta file(s)
-h, --help help for locate
-o, --out-prefix string out file prefix ("-" for stdout) (default "-")
Usage (unikmer uniqs)
$ unikmer uniqs -h
Mapping k-mers back to genome and find unique subsequences
Attention:
1. All files should have the 'canonical' flag.
2. Default output is in BED3 format, with left-closed and right-open
0-based interval.
3. When using experimental flag --circular, leading subsequence of k-1 bp
is appending to end of sequence.
1) End position of k-mers that crossing sequence end would be
greater than sequence length.
2) Longer subsequences are not further extended.
Usage:
unikmer uniqs [flags]
Flags:
-M, --allow-muliple-mapped-kmer allow multiple mapped k-mers
--circular circular genome (experimental)
-g, --genome strings genomes in (gzipped) fasta file(s)
-h, --help help for uniqs
-x, --max-cont-non-uniq-kmers int max continuous non-unique k-mers
-X, --max-num-cont-non-uniq-kmers int max number of continuous non-unique k-mers
-m, --min-len int minimum length of subsequence (default 200)
-o, --out-prefix string out file prefix ("-" for stdout) (default "-")
-a, --output-fasta output fasta format instead of BED3
-W, --seqs-in-a-file-as-one-genome treat seqs in a genome file as one genome
Examples:
$ echo -ne ">s\nACTGCAATGC\n" | tee t.fa
>s
ACTGCAATGC
$ unikmer count -k 2 -K -s t.fa -o t.fa.k2
$ unikmer view t.fa.k2.unik
AA
AC
AG
AT
CA
GC
$ unikmer locate --genome t.fa t.fa.k2.unik
s 5 7 AA 0 .
s 0 2 AC 0 .
s 1 3 CT 0 .
s 6 8 AT 0 .
s 2 4 TG 0 .
s 4 6 CA 0 .
s 7 9 TG 0 .
s 3 5 GC 0 .
s 8 10 GC 0
$ unikmer uniqs --genome t.fa t.fa.k2.unik --min-len 2
s 0 3
s 5 8
$ unikmer uniqs --genome t.fa t.fa.k2.unik --min-len 2 --output-fasta
>s:1-3
ACT
>s:6-8
AAT
$ unikmer uniqs --genome t.fa t.fa.k2.unik --min-len 2 --output-fasta --allow-muliple-mapped-kmer
>s:1-10
ACTGCAATGC
BTW, how to you use this tool? I mean what's the scenario?
Thanks, figured out in the meantime how to use :) We are using it for a commercial diagnostics application. If you want to know more, we can have a chat any time, just email me.
info at genexa.ch?
Yes :)
Please document usage of
locate
command. Thank you so much for this great tool!