ncbi / sra-human-scrubber

An SRA tool that takes as input local fastq file from a clinical infection sample, identifies and removes any significant human read, and outputs the edited (cleaned) fastq file that can safely be used for SRA submission.
Other
44 stars 7 forks source link

interleaved reads are not paired anymore after scrubbing with -x option #23

Closed Bio-finder closed 1 year ago

Bio-finder commented 1 year ago

Hello, I wanted just to report that when I scrub an interleaved fastq with the -x option, the resulting interleaved fastq is sometimes not paired anymore. This was unexpected as I saw that for paired files both reads would be masked if one of the two was detected as human. Could you fix this behavior in your next update? Best regards,

multikengineer commented 1 year ago

Bio-finder, our apologies. The documentation is wrong and can be corrected, however is that a feature you would like to see?

Bio-finder commented 1 year ago

Hello, I would like to have a feature to remove paired reads so ideally, I would like a command where I don't even need interleaved reads (so I could provide as arguments R1 and R2 and they would be processed to be still paired after the scrubbing) but if it's not possible then yes I would like to have this feature for interleaved reads at least. Best regards,

bede commented 1 year ago

Hi there, I think this feature would be valuable for many users

multikengineer commented 1 year ago

With 2.2.0 release there is now explicit flag (option) for interleaved files - please see changelog and readme.

-s ; Input is (collated) interleaved paired-end(read) file AND you wish both reads masked or removed.

Please let me know of any issues.

mikelchtermans commented 1 year ago

Some tests seem to indicate that the -s option does not work as intended on interleaved paired-end files ( either actually interleaved R1R2R1R2 or combined R1R1R2R2). The issue appears to be that the script stops processing in the middle of the fastq file. This behavior occurs using -s alone, and -s in combination with -x.

multikengineer commented 1 year ago

It will only work with collated interleaved (R1R2R1R2....). I tested with such a file and found no problem, but I will test again. Do you have an example you could share or an SRA object (accession) for which you have the problem? Thanks you for your patience.

mikelchtermans commented 1 year ago

Below a fictive SRA accession with 4 paired reads that does not work, please correct me if i'm doing anything wrong:

@SRR103937431289.1 1/1 CTCACCTTATACAAAAATCAACTCAAGATGGATTAAGGTCTTAAACATAAGACCTGAAACTAAAAATTCNNCAAGATAACTTTGGAAGAACCCTTCTGGACATTGGCTCAGNNAAGGATTTCATGACCANAANCCCAAAAGCAAATGCAATAAAAACAAAGANAAATCACTGGGACCTAATTAAACCAAACAACTTTTGCACGGCACATGGACAGTCAGCAGAGTAAAAAGACAACCCACAGAATGGGAGA + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:CFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:BFFGGGGGGGGGGGG#:D#:AFGGGGGGGGGGGGGGGFGGGGGGGGGG#9DFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGFGDFGGGGGGGGGGGGGGGGGGGGGGGDGGGGFFFFFFFFF @SRR103937431289.2 2/1 AAGCAAAGGTGTCCCCAAAGTGGGATTTTCCACTTCTGGCATTATGTTGGTGGTGCTCAAAAAGTTTTGNNTTTTGGAGAATTTCAGATTTTTGGTTTTTGATTAGGCCTGGNTTCTCCGTCTTAGGGGNCCTAATAAGCTAGGCCCACTTCCACCAAACCACAGATGGAACTCACATGGGGAGTTTACACTTGAAAGCTTTTTTTCTTGTCTATCCACCCAATTTGTTCACTTTTCTCTGTATTAGAAAG + @CCCCGGFC6C@EEGGGG88CFFDFGGGGEFDA@FFFGGDGDFGGGGGDFFE7FGGGGC<DFGGFCF,##::DFFG@FFFGGCFFFFGGGGGDEFFGGGGFFGDFGGGGD#:AFGGGGGGGGGAFC+#:A=<FGGGGFGGGFGGCCCFGGGGGGFGGGGE7FFGGFGGDEGGFG@DEDECGE6D;3@;EFDDFGAF83@FEEE8CF9FGGGFAC,2=19EFG?C+C?CFGC+=@=++++4+=B+114B1 @SRR103937431289.3 3/1 TTGCTCGTGCATGTGGATATTTGCTCAGATTAAGAATGTTTCCCACAAAATTGCTTACTCTCTTTCATTNNGTCCGTGGTTGCTTGTGGTTTATTTGTTGATAACTGAGACTNTTTAAAGATTGCCAAANCAAAAGCCATTTCTTTTCTCCCCGAATAGCCTCATTTTTTCTATTCCCTTTTTCCCCTCCTATTAACATTTCTGGGGACTCTAAAGCAGAGCTCAAGATGGTTCAGGACCACGGAGAGCG + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE @SRR103937431289.4 4/1 GACAAAGGACACAAAGTCTAGCTCCCTCCAAAGTCTCTGAGCCCCCAGCCTGCCTGGCATGGCCAAGGNNNGCCCACANNNNNCNGCTCACCAAGAACACAGAGGCACAGTNNCCTGGGTCCAAGAGAGNGTNGAGCTCTAAGAGAANATTTAAATATCNCANACCTGTATGTTCACCTTTATACCTATGCAGTCACAGAATAGAACTTTCAATATTTGTCAGGAGAGAGGGACCCAGGAAGGCAGACGGG + CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@EGG,ACCCEGGE7ECGGGGG49*;DGG6?CCFF>;8/888:B @SRR103937431289.1 1/2 NATCATTAGTGATGTTGAGCATTTTTTCATATGTTTGTTGCCCATTNNNATATCTTCTTTTGAGAANNGGCTATTTATATCCATAGCTCACTTTTTGATGGGATTCTTTTTTTCTTACTGATTTGAGTTCATTGTAGATTCTGGATATTAGTCTTTTGTCAGATGTATAGATTGTGAAAATTNTCTCCCATTCTGTGGGTTGTNTTTTNNCTCTGCTGACTGTCCATGTGCCNTGCAAAAGTTGTTTGGTT +

8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG###::CFGGGGGGGGGGGG##::FGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGFGGGGGGGGGGGGGGFGFGGGGGCFGGGGGGGGGGGGGGGGGFFFGGGGGGGGGGGGGEGGGGGGGGGCFGGGGEGGFGG#6@?6@DFGGGGGGDGGGED#6@CG##+6=?DC@FAFGDF+;=FAFFF8#33@CFA?AFFEFFD==DF

@SRR103937431289.2 2/2 NCATTATATTTTGCCATATCACTGGAAATGGGATTTCTGAGTCACGNNNGAAATTCATACGTAATTNNGCTAGGCGCTGCTAAATTCCACTTGTTTAAGCTTGGGTTCTTGGGAATCTGACTCAGAGACAGAGATAATTGTGTGGAAGGTTTATGAGGTTTATATTCAAGACCCATGAATGTNGTTTCCAGGTAAAAGCTGAANTGAGNNTAGATCCCAGAATTGCTATCTCNAACAAATATGACAAAATA +

88ACCFFFGGGFGFFDGCFGAFGFGCFGC@,@CF@<9FC6FCAF###:6:CC@AFFFEFEGGGG##,:@CCE8FGGEGFGGFFG9<<AFEFDFG9E<9<@AFEEEF,ED9,FGDECFFFG9<=AFGG,AA=8?=FFGC;=EFDF9A7,E9F4==,>,C>DFF@;>FBE,,,>DFGGADC#36@@?3CDGGCDAE=FAF?D#+0=+##+3+37+2+;6@F?A?A++1;6+#03:A++++*:AD91

@SRR103937431289.3 3/2 NAAAATTAGCTGGGCATGGTGGTGCACACCTGTAATCCCAGCTACTNNNGAGGCTGAGGCGCTAGANNCGCTTGGACCCGGGACGCGGAGGTTGAAGTGAGCCAAGATCGTGCTACTGCACTCCAGCCTGGGTGACAGAGCGAGACTCCCTCTCAAAAAAACAGAAAAATAAAAACAATTTTNGGATCATCTCTTTCCCCACCNGCATNNCTAAAGTGCTTTGGTAGTTCCCNTTCCTTGCGCTCTCCGTG +

8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF###::DDFGGGGGGGGGGGG##:CFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG9FF#4=FGGGGFGGGGGGCFGF5D#/2>:##/2>FFGGFFFCGFFFFFAFFFF#007FFF?FBFFFFFFBF:

@SRR103937431289.4 4/2 NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA +

8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8=F++,,3,3@+,,,,7,7@@DF,7,>D*>>F>BF714*1*:/2;0+++3<0<9C6;/12*211*0+####2**00++FG=C##1###//**.775579*65))07##)))0))./)6))).))

multikengineer commented 1 year ago

Can you go to this page and download the fastq for SRR25478177 https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR25478177&display=download

Unizip it and you will see it is interleaved fastq:

head ~/fastq/SRR25478177.fastq
@SRR25478177.1.1 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
GCTAGAAGGTCTTTAATGCACTCAAGAGGGTAGCCATCAGGGCCACAGTAGTTGTTATCGTCATAGCGAGTGTATGCCCCTCCGTTAAGCTCACGCATGAGTTCACGGGTAACACC
+SRR25478177.1.1 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
1>A1111B?FFFGGFFGBDDFGF1111000B001A1B11110000/00121DD1DF2DA/0A/B221///A/D2D2210/?/>/FG//11B11B/0/?/11BF121///>/0>1>>
@SRR25478177.1.2 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
TTTTTTTCCCTTGTTCTCTTTCTTTTTCTTTACTTTTTTTCTTACTCTCTCTTTTTCGTTTTCTTCTTCTGTGGCCCTGTTGGCTACCCTCTTGTGTTCTTTTTATTCCTTCTTTC
+SRR25478177.1.2 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
11>1>>11111B1113B33B333B3110DD12112211//01D2111A111B2A1B0/0B//2112DD2D1D20/0/B01B1/111000B/>B111B022BB11/1222BB1B112

This file when run without the -s flag

$   sudo docker run -it -v $PWD:$PWD:rw -w $PWD us.gcr.io/ncbi-research-sra-dataload/sra-human-scrubber:2.2.0 /opt/scrubber/scripts/scrub.sh -i ~/fastq/SRR25478177.fastq 
2023-08-02 20:41:57 aligns_to version 0.801
2023-08-02 20:41:57 hardware threads: 8, omp threads: 8
2023-08-02 20:42:09 loading time (sec) 11
2023-08-02 20:42:09 /tmp/tmp.K3zTXjBNA5/temp.fasta
2023-08-02 20:42:09 FastaReader
2023-08-02 20:42:09 0% processed
2023-08-02 20:42:11 100% processed
2023-08-02 20:42:11 total spot count: 171728
2023-08-02 20:42:11 total read count: 171728
2023-08-02 20:42:11 total time (sec) 13
29  spot(s) masked or removed.

Then with -s flag

$   sudo docker run -it -v $PWD:$PWD:rw -w $PWD us.gcr.io/ncbi-research-sra-dataload/sra-human-scrubber:2.2.0 /opt/scrubber/scripts/scrub.sh -s -i ~/fastq/SRR25478177.fastq 
2023-08-02 20:42:31 aligns_to version 0.801
2023-08-02 20:42:31 hardware threads: 8, omp threads: 8
2023-08-02 20:42:31 loading time (sec) 0
2023-08-02 20:42:31 /tmp/tmp.MF630KCzZA/temp.fasta
2023-08-02 20:42:31 FastaReader
2023-08-02 20:42:31 0% processed
2023-08-02 20:42:33 100% processed
2023-08-02 20:42:33 total spot count: 85864
2023-08-02 20:42:33 total read count: 171728
2023-08-02 20:42:33 total time (sec) 1
16  spot(s) masked or removed

Finally with both -s and -x

$   sudo docker run -it -v $PWD:$PWD:rw -w $PWD us.gcr.io/ncbi-research-sra-dataload/sra-human-scrubber:2.2.0 /opt/scrubber/scripts/scrub.sh -s -x -i ~/fastq/SRR25478177.fastq  
2023-08-02 20:49:44 aligns_to version 0.801
2023-08-02 20:49:44 hardware threads: 8, omp threads: 8
2023-08-02 20:49:44 loading time (sec) 0
2023-08-02 20:49:44 /tmp/tmp.wNMCIr09mb/temp.fasta
2023-08-02 20:49:44 FastaReader
2023-08-02 20:49:44 0% processed
2023-08-02 20:49:45 100% processed
2023-08-02 20:49:45 total spot count: 85864
2023-08-02 20:49:45 total read count: 171728
2023-08-02 20:49:45 total time (sec) 1
16  spot(s) masked or removed.
$ ls -l ~/fastq/
total 149044
-rwxr-xr-x 1 kskatz kskatz 76309278 Aug  2 20:15 SRR25478177.fastq
-rw-r--r-- 1 root   root   76302482 Aug  2 20:49 SRR25478177.fastq.clean

Can you do the same and let me know? Thank you for your patience.

mikelchtermans commented 1 year ago

Hi, I did the same and obtain the same stdout results. However, these are not my concern, but rather the nr of lines/reads (wc -l / 4 ) remaining in the fastq file: no option: 686912 lines s flag: 686912 lines x flag: 686796 lines x + s flag: 686848 lines

The nr of lines/reads using both x and s flags is larger than the amount when only using x. Using s i expect the mates of the reads removed by x to also be removed; resulting in a lower line/read number.

Like I mentioned in a previous comment, I observed that starting in the exact middle of the file, the -s flag stops removing any reads (both masking or removing)

In the example i provided above: no option: 32 lines s flag: 32 lines x flag: 4 lines x + s flag: 16 lines expected 0

Reads in the example I provided above: no option (1 read remaining):

@SRR103937431289.1 1/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:CFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:BFFGGGGGGGGGGGG#:D#:AFGGGGGGGGGGGGGGGFGGGGGGGGGG#9DFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGFGDFGGGGGGGGGGGGGGGGGGGGGGGDGGGGFFFFFFFFF
@SRR103937431289.1 1/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG###::C<FGGGGGGGGGGGG##::FGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGFGGGGGGGGGGGGGGFGFGGGGGCFGGGGGGGGGGGGGGGGGFFFGGGGGGGGGGGGGEGGGGGGGGGCFGGGGEGGFGG#6@?6@DFGGGGGGDG>GGED#6@CG##+6=?DC@FAFGDF+;=FAFFF8#33@CFA?AFFEFFD==DF
@SRR103937431289.2 2/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
@CCCCGGFC6C@EEGGGG88CFFDFGGGG<EFDA@FFFGGDGDFGGGGGDFFE7FGGGGC<DFGGFCF,##::DFFG@FFFGGCFFFFGGGGGDEFFGGGG>FFGDFGGGGD#:AFGGGGGGGGGAFC+#:A=<FGGGGFGGGFGGCCCFGGGGGGFGGGGE7FFGGFGGDEGGFG@DEDECGE6D;3@;EFDDFGAF83@FEEE8CF9FGGGFAC,2=19EFG?C+C?CFGC+=@=++++4+=B+114B1
@SRR103937431289.2 2/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#88ACCFFFGGGFGFFDGCFGAFG<FGCFGC@,@cf@<9FC6FCAF###:6:CC@AFFFEFEGGGG##,:@CCE8FGGEGFGGFFG9<<AFEFDFG9E<9<@afeeef,ED9,FGDECFFFG9<=AFGG,AA=8?=FFGC;=EFDF9A7,E9F4==,>>,C>DFF@;>FBE,,,>DFGGADC#36@@?3CDGGCDAE=FAF?D#+0=+##+3+37+2+;6@F?A?A++1;6+#03:A++++**:AD91
@SRR103937431289.3 3/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE
@SRR103937431289.3 3/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF###::DDFGGGGGGGGGGGG##:CFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG9FF#4=FGGGGFGGGGGGCFGF5D#/2>:##/2>FFGGFFFCGFFFFFAFFFF#007FFF?FBFFFFFFBF:
@SRR103937431289.4 4/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=<ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@Egg,ACCCEGGE7ECGGGGG49*;DGG>6?CCFF>;8/888:B
@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))

s flag (4 reads remaining, bottom half, previously 3/4 masked by no option, expected 0):

@SRR103937431289.1 1/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:CFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:BFFGGGGGGGGGGGG#:D#:AFGGGGGGGGGGGGGGGFGGGGGGGGGG#9DFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGFGDFGGGGGGGGGGGGGGGGGGGGGGGDGGGGFFFFFFFFF
@SRR103937431289.1 1/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG###::C<FGGGGGGGGGGGG##::FGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGFGGGGGGGGGGGGGGFGFGGGGGCFGGGGGGGGGGGGGGGGGFFFGGGGGGGGGGGGGEGGGGGGGGGCFGGGGEGGFGG#6@?6@DFGGGGGGDG>GGED#6@CG##+6=?DC@FAFGDF+;=FAFFF8#33@CFA?AFFEFFD==DF
@SRR103937431289.2 2/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
@CCCCGGFC6C@EEGGGG88CFFDFGGGG<EFDA@FFFGGDGDFGGGGGDFFE7FGGGGC<DFGGFCF,##::DFFG@FFFGGCFFFFGGGGGDEFFGGGG>FFGDFGGGGD#:AFGGGGGGGGGAFC+#:A=<FGGGGFGGGFGGCCCFGGGGGGFGGGGE7FFGGFGGDEGGFG@DEDECGE6D;3@;EFDDFGAF83@FEEE8CF9FGGGFAC,2=19EFG?C+C?CFGC+=@=++++4+=B+114B1
@SRR103937431289.2 2/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#88ACCFFFGGGFGFFDGCFGAFG<FGCFGC@,@cf@<9FC6FCAF###:6:CC@AFFFEFEGGGG##,:@CCE8FGGEGFGGFFG9<<AFEFDFG9E<9<@afeeef,ED9,FGDECFFFG9<=AFGG,AA=8?=FFGC;=EFDF9A7,E9F4==,>>,C>DFF@;>FBE,,,>DFGGADC#36@@?3CDGGCDAE=FAF?D#+0=+##+3+37+2+;6@F?A?A++1;6+#03:A++++**:AD91
@SRR103937431289.3 3/1
TTGCTCGTGCATGTGGATATTTGCTCAGATTAAGAATGTTTCCCACAAAATTGCTTACTCTCTTTCATTNNGTCCGTGGTTGCTTGTGGTTTATTTGTTGATAACTGAGACTNTTTAAAGATTGCCAAANCAAAAGCCATTTCTTTTCTCCCCGAATAGCCTCATTTTTTCTATTCCCTTTTTCCCCTCCTATTAACATTTCTGGGGACTCTAAAGCAGAGCTCAAGATGGTTCAGGACCACGGAGAGCG
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE
@SRR103937431289.3 3/2
NAAAATTAGCTGGGCATGGTGGTGCACACCTGTAATCCCAGCTACTNNNGAGGCTGAGGCGCTAGANNCGCTTGGACCCGGGACGCGGAGGTTGAAGTGAGCCAAGATCGTGCTACTGCACTCCAGCCTGGGTGACAGAGCGAGACTCCCTCTCAAAAAAACAGAAAAATAAAAACAATTTTNGGATCATCTCTTTCCCCACCNGCATNNCTAAAGTGCTTTGGTAGTTCCCNTTCCTTGCGCTCTCCGTG
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF###::DDFGGGGGGGGGGGG##:CFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG9FF#4=FGGGGFGGGGGGCFGF5D#/2>:##/2>FFGGFFFCGFFFFFAFFFF#007FFF?FBFFFFFFBF:
@SRR103937431289.4 4/1
GACAAAGGACACAAAGTCTAGCTCCCTCCAAAGTCTCTGAGCCCCCAGCCTGCCTGGCATGGCCAAGGNNNGCCCACANNNNNCNGCTCACCAAGAACACAGAGGCACAGTNNCCTGGGTCCAAGAGAGNGTNGAGCTCTAAGAGAANATTTAAATATCNCANACCTGTATGTTCACCTTTATACCTATGCAGTCACAGAATAGAACTTTCAATATTTGTCAGGAGAGAGGGACCCAGGAAGGCAGACGGG
+
CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=<ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@Egg,ACCCEGGE7ECGGGGG49*;DGG>6?CCFF>;8/888:B
@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))

x flag (1 read remaining, same as without flag):

@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))

x + s flags (4 reads remaining, bottom half, previously 3/4 removed by x flag, expected 0):

@SRR103937431289.3 3/1
TTGCTCGTGCATGTGGATATTTGCTCAGATTAAGAATGTTTCCCACAAAATTGCTTACTCTCTTTCATTNNGTCCGTGGTTGCTTGTGGTTTATTTGTTGATAACTGAGACTNTTTAAAGATTGCCAAANCAAAAGCCATTTCTTTTCTCCCCGAATAGCCTCATTTTTTCTATTCCCTTTTTCCCCTCCTATTAACATTTCTGGGGACTCTAAAGCAGAGCTCAAGATGGTTCAGGACCACGGAGAGCG
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE
@SRR103937431289.3 3/2
NAAAATTAGCTGGGCATGGTGGTGCACACCTGTAATCCCAGCTACTNNNGAGGCTGAGGCGCTAGANNCGCTTGGACCCGGGACGCGGAGGTTGAAGTGAGCCAAGATCGTGCTACTGCACTCCAGCCTGGGTGACAGAGCGAGACTCCCTCTCAAAAAAACAGAAAAATAAAAACAATTTTNGGATCATCTCTTTCCCCACCNGCATNNCTAAAGTGCTTTGGTAGTTCCCNTTCCTTGCGCTCTCCGTG
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF###::DDFGGGGGGGGGGGG##:CFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG9FF#4=FGGGGFGGGGGGCFGF5D#/2>:##/2>FFGGFFFCGFFFFFAFFFF#007FFF?FBFFFFFFBF:
@SRR103937431289.4 4/1
GACAAAGGACACAAAGTCTAGCTCCCTCCAAAGTCTCTGAGCCCCCAGCCTGCCTGGCATGGCCAAGGNNNGCCCACANNNNNCNGCTCACCAAGAACACAGAGGCACAGTNNCCTGGGTCCAAGAGAGNGTNGAGCTCTAAGAGAANATTTAAATATCNCANACCTGTATGTTCACCTTTATACCTATGCAGTCACAGAATAGAACTTTCAATATTTGTCAGGAGAGAGGGACCCAGGAAGGCAGACGGG
+
CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=<ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@Egg,ACCCEGGE7ECGGGGG49*;DGG>6?CCFF>;8/888:B
@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))

Increasing or decreasing the file size results in the same phenomenon with the s flag stopping to process reads past the middle of the file.

Kind regards

multikengineer commented 1 year ago

Indeed @mikelchtermans you have identified a bug and I will release a bug fix very soon. Thank you for your patience.

multikengineer commented 1 year ago

@mikelchtermans please see it is fixed in the new release tag (2.2.1)