shenwei356 / seqkit

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
https://bioinf.shenwei.me/seqkit
MIT License
1.25k stars 157 forks source link

seqkit locate: typo in error message #442

Closed stas-malavin closed 4 months ago

stas-malavin commented 4 months ago
$ cat contig.fa | seqkit locate -p '"aaggttaa.{0,24}cagcacct"' -r -P -m 2
[ERRO] flag -r (--use-regexp) not allowed when giving flag -m (--use-regexp)

Should be -m (--max-mismatch)

shenwei356 commented 4 months ago

Thank you Stas!

stas-malavin commented 4 months ago

By the way, can I somehow easily delete the located sequences using seqkit locate? I'm trimming adapters and barcodes out of a messy Nanopore assembly. There are various combinations of sequences that I need to locate and trim. What I'm doing now is combining beds from several locate runs and then editing the resulting bed externally, to combine regions from the same contigs and add coordinates for contigs that have no adapters [start:end], and finally using seqkit subseq --bed to get the trimmed assembly. Would be extremely nice to do it all in one go…

shenwei356 commented 4 months ago

Yes, you can just use the amplicon. Example 5.

$ echo -ne ">s\nacggaaaaa\n" 
>s
acggaaaaa

$ echo -ne ">s\nacggaaaaa\n" \
    | seqkit amplicon -F actg -m 1 -f -r 1:99999999999
[INFO] 1 primer pair loaded
>s
aaaaa

There are various combinations of sequences that I need to locate and trim.

seqkit amplicon support a list of primers with -p, --primer-file.

stas-malavin commented 4 months ago

Gosh, I knew it's somewhere there… Thanks so much!

stas-malavin commented 4 months ago

Actually, I need a regular expression AAGGTTAA.{0,30}CAGCACCT, which seems not possible with amplicon. But, I can put all the actual adapters instead of .{0,30}, put them all in a file, as you suggested, and allow some mismatches. Yeah, should work this way.

shenwei356 commented 4 months ago

Oh, yes. amplicon does not support regular expressions.