shenwei356 / seqkit

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
https://bioinf.shenwei.me/seqkit
MIT License
1.26k stars 158 forks source link

Fail to build FMIndex for sequence #479

Closed nsyzrantsev closed 1 month ago

nsyzrantsev commented 1 month ago

Hi! Thank you very much for your tool! Seqkit always makes me happy :)

But I found a small bug.

Prerequisites

OS: ubuntu20.04, amd64 (personal computer) Command:

seqkit grep -s -f barcodes.txt input.fastq -o output.fastq -m 1 -R -30:-1

barcodes.txt file contains such sequences:

AGTAGGCT
GTAGGCTC
CTGTACGA
TGTACGAC
GCACCAAG
CACCAAGC
TACGTTTC
ACGTTTCC

input.fastq contains such read:

@1
NNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
############################

Bug

This command successfully be done without any problems:

seqkit grep -s -f barcodes.txt input.fq -o output.fq -m 1 -R 1:30

But this command fails:

seqkit grep -s -f barcodes.txt input.fq -o out.fq -m 1 -R -30:-1

[INFO] 8 patterns loaded from file
[ERRO] fail to build FMIndex for sequence: 1
shenwei356 commented 1 month ago

Actually, it's a bug in sequence region parsing.

For -R -30:-1, it returns an empty sequence if the sequence is shorter than 30 (28 for sequence 1).