shenwei356 / seqkit

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
https://bioinf.shenwei.me/seqkit
MIT License
1.32k stars 159 forks source link

`seqkit amplicon --primer-file` returns a different result than `seqkit amplicon -F -R` #457

Closed bdelepine closed 6 months ago

bdelepine commented 7 months ago

Hi everyone,

I encounter an unexpected behavior of seqkit amplicon: seqkit amplicon -p returns a different result than seqkit amplicon -F -R.

seq.fasta

>seq1
ATGCGCTATATATATTTT
>seq2
GGGGAGTGTGTGTGTTTT

These commands produce the expected output:

seqkit amplicon -F "ATGC" -R "AAAA" seq.fasta --bed
# seq1    0       18      .       0       +       ATGCGCTATATATATTTT

seqkit amplicon -F "GGGG" -R "AAAA" seq.fasta --bed
# seq2    0       18      .       0       +       GGGGAGTGTGTGTGTTTT

However, if we introduce a file to store the primers, we do not get the same results: arr.txt

arr1    ATGC    AAAA
arr2    GGGG    AAAA
seqkit amplicon -p arr.txt seq.fasta --bed
# seq2    0       18      arr1    0       +       GGGGAGTGTGTGTGTTTT
# seq2    0       18      arr2    0       +       GGGGAGTGTGTGTGTTTT

It looks like the sequence seq2 is misreported as a valid amplicon of arrangement arr1; whereas I would expect seq1 to be reported, as in the -F -R example. Tested on 2.7.0. The same behavior may be observed without the --bed option.

Am I missing something?

HTH

shenwei356 commented 7 months ago

Thanks for reporting this. It's a bug introduced in v2.8.0, that occurred when more than 2 pairs of primers were given. Fixed now.

$ seqkit amplicon -p arr.txt seq.fasta --bed
[INFO] 2 primer pair loaded
seq1    0       18      arr1    0       +       ATGCGCTATATATATTTT
seq2    0       18      arr2    0       +       GGGGAGTGTGTGTGTTTT
CGD-Helix commented 6 months ago

Hi @shenwei356,

I have noticed the same issue, even with the latest download of SeqKit.

After downloading the seqkit_linux_amd64.tar.gz link you have above, I tried to replicate the issue and it still does not seem to be working correctly.

With the two primer pairs in a file, there is no output:

cat seq.fasta | seqkit amplicon -p arr.txt --bed
[INFO] 2 primer pair loaded
seqkit amplicon -p arr.txt seq.fasta --bed
[INFO] 2 primer pair loaded

However, both for the command line input and the file input for one primer pair, it works as intended:

File with one primer pair:

arr1    ATGC    AAAA
seqkit amplicon -p arr.txt seq.fasta --bed 
[INFO] 1 primer pair loaded
seq1    0   18  arr1    0   +   ATGCGCTATATATATTTT

The commands used for one primer pair:

seqkit amplicon -F "ATGC" -R "AAAA" seq.fasta --bed
[INFO] 1 primer pair loaded
seq1    0   18  .   0   +   ATGCGCTATATATATTTT

seqkit amplicon -F "GGGG" -R "AAAA" seq.fasta --bed
[INFO] 1 primer pair loaded
seq2    0   18  .   0   +   GGGGAGTGTGTGTGTTTT
shenwei356 commented 6 months ago

Gosh, seem like I uploaded the wrong binaries. Sorry for this!

seqkit_linux_amd64.tar.gz

CGD-Helix commented 6 months ago

No worries, thanks!