Closed Bio-finder closed 1 year ago
Bio-finder, our apologies. The documentation is wrong and can be corrected, however is that a feature you would like to see?
Hello, I would like to have a feature to remove paired reads so ideally, I would like a command where I don't even need interleaved reads (so I could provide as arguments R1 and R2 and they would be processed to be still paired after the scrubbing) but if it's not possible then yes I would like to have this feature for interleaved reads at least. Best regards,
Hi there, I think this feature would be valuable for many users
With 2.2.0 release there is now explicit flag (option) for interleaved files - please see changelog and readme.
-s ; Input is (collated) interleaved paired-end(read) file AND you wish both reads masked or removed.
Please let me know of any issues.
Some tests seem to indicate that the -s option does not work as intended on interleaved paired-end files ( either actually interleaved R1R2R1R2 or combined R1R1R2R2). The issue appears to be that the script stops processing in the middle of the fastq file. This behavior occurs using -s alone, and -s in combination with -x.
It will only work with collated interleaved (R1R2R1R2....). I tested with such a file and found no problem, but I will test again. Do you have an example you could share or an SRA object (accession) for which you have the problem? Thanks you for your patience.
Below a fictive SRA accession with 4 paired reads that does not work, please correct me if i'm doing anything wrong:
@SRR103937431289.1 1/1 CTCACCTTATACAAAAATCAACTCAAGATGGATTAAGGTCTTAAACATAAGACCTGAAACTAAAAATTCNNCAAGATAACTTTGGAAGAACCCTTCTGGACATTGGCTCAGNNAAGGATTTCATGACCANAANCCCAAAAGCAAATGCAATAAAAACAAAGANAAATCACTGGGACCTAATTAAACCAAACAACTTTTGCACGGCACATGGACAGTCAGCAGAGTAAAAAGACAACCCACAGAATGGGAGA + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:CFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:BFFGGGGGGGGGGGG#:D#:AFGGGGGGGGGGGGGGGFGGGGGGGGGG#9DFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGFGDFGGGGGGGGGGGGGGGGGGGGGGGDGGGGFFFFFFFFF @SRR103937431289.2 2/1 AAGCAAAGGTGTCCCCAAAGTGGGATTTTCCACTTCTGGCATTATGTTGGTGGTGCTCAAAAAGTTTTGNNTTTTGGAGAATTTCAGATTTTTGGTTTTTGATTAGGCCTGGNTTCTCCGTCTTAGGGGNCCTAATAAGCTAGGCCCACTTCCACCAAACCACAGATGGAACTCACATGGGGAGTTTACACTTGAAAGCTTTTTTTCTTGTCTATCCACCCAATTTGTTCACTTTTCTCTGTATTAGAAAG + @CCCCGGFC6C@EEGGGG88CFFDFGGGGEFDA@FFFGGDGDFGGGGGDFFE7FGGGGC<DFGGFCF,##::DFFG@FFFGGCFFFFGGGGGDEFFGGGGFFGDFGGGGD#:AFGGGGGGGGGAFC+#:A=<FGGGGFGGGFGGCCCFGGGGGGFGGGGE7FFGGFGGDEGGFG@DEDECGE6D;3@;EFDDFGAF83@FEEE8CF9FGGGFAC,2=19EFG?C+C?CFGC+=@=++++4+=B+114B1 @SRR103937431289.3 3/1 TTGCTCGTGCATGTGGATATTTGCTCAGATTAAGAATGTTTCCCACAAAATTGCTTACTCTCTTTCATTNNGTCCGTGGTTGCTTGTGGTTTATTTGTTGATAACTGAGACTNTTTAAAGATTGCCAAANCAAAAGCCATTTCTTTTCTCCCCGAATAGCCTCATTTTTTCTATTCCCTTTTTCCCCTCCTATTAACATTTCTGGGGACTCTAAAGCAGAGCTCAAGATGGTTCAGGACCACGGAGAGCG + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE @SRR103937431289.4 4/1 GACAAAGGACACAAAGTCTAGCTCCCTCCAAAGTCTCTGAGCCCCCAGCCTGCCTGGCATGGCCAAGGNNNGCCCACANNNNNCNGCTCACCAAGAACACAGAGGCACAGTNNCCTGGGTCCAAGAGAGNGTNGAGCTCTAAGAGAANATTTAAATATCNCANACCTGTATGTTCACCTTTATACCTATGCAGTCACAGAATAGAACTTTCAATATTTGTCAGGAGAGAGGGACCCAGGAAGGCAGACGGG + CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@EGG,ACCCEGGE7ECGGGGG49*;DGG6?CCFF>;8/888:B @SRR103937431289.1 1/2 NATCATTAGTGATGTTGAGCATTTTTTCATATGTTTGTTGCCCATTNNNATATCTTCTTTTGAGAANNGGCTATTTATATCCATAGCTCACTTTTTGATGGGATTCTTTTTTTCTTACTGATTTGAGTTCATTGTAGATTCTGGATATTAGTCTTTTGTCAGATGTATAGATTGTGAAAATTNTCTCCCATTCTGTGGGTTGTNTTTTNNCTCTGCTGACTGTCCATGTGCCNTGCAAAAGTTGTTTGGTT +
@SRR103937431289.2 2/2 NCATTATATTTTGCCATATCACTGGAAATGGGATTTCTGAGTCACGNNNGAAATTCATACGTAATTNNGCTAGGCGCTGCTAAATTCCACTTGTTTAAGCTTGGGTTCTTGGGAATCTGACTCAGAGACAGAGATAATTGTGTGGAAGGTTTATGAGGTTTATATTCAAGACCCATGAATGTNGTTTCCAGGTAAAAGCTGAANTGAGNNTAGATCCCAGAATTGCTATCTCNAACAAATATGACAAAATA +
@SRR103937431289.3 3/2 NAAAATTAGCTGGGCATGGTGGTGCACACCTGTAATCCCAGCTACTNNNGAGGCTGAGGCGCTAGANNCGCTTGGACCCGGGACGCGGAGGTTGAAGTGAGCCAAGATCGTGCTACTGCACTCCAGCCTGGGTGACAGAGCGAGACTCCCTCTCAAAAAAACAGAAAAATAAAAACAATTTTNGGATCATCTCTTTCCCCACCNGCATNNCTAAAGTGCTTTGGTAGTTCCCNTTCCTTGCGCTCTCCGTG +
@SRR103937431289.4 4/2 NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA +
Can you go to this page and download the fastq for SRR25478177 https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR25478177&display=download
Unizip it and you will see it is interleaved fastq:
head ~/fastq/SRR25478177.fastq
@SRR25478177.1.1 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
GCTAGAAGGTCTTTAATGCACTCAAGAGGGTAGCCATCAGGGCCACAGTAGTTGTTATCGTCATAGCGAGTGTATGCCCCTCCGTTAAGCTCACGCATGAGTTCACGGGTAACACC
+SRR25478177.1.1 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
1>A1111B?FFFGGFFGBDDFGF1111000B001A1B11110000/00121DD1DF2DA/0A/B221///A/D2D2210/?/>/FG//11B11B/0/?/11BF121///>/0>1>>
@SRR25478177.1.2 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
TTTTTTTCCCTTGTTCTCTTTCTTTTTCTTTACTTTTTTTCTTACTCTCTCTTTTTCGTTTTCTTCTTCTGTGGCCCTGTTGGCTACCCTCTTGTGTTCTTTTTATTCCTTCTTTC
+SRR25478177.1.2 M05870:130:000000000-GGFBY:1:1101:16240:1878 length=116
11>1>>11111B1113B33B333B3110DD12112211//01D2111A111B2A1B0/0B//2112DD2D1D20/0/B01B1/111000B/>B111B022BB11/1222BB1B112
This file when run without the -s flag
$ sudo docker run -it -v $PWD:$PWD:rw -w $PWD us.gcr.io/ncbi-research-sra-dataload/sra-human-scrubber:2.2.0 /opt/scrubber/scripts/scrub.sh -i ~/fastq/SRR25478177.fastq
2023-08-02 20:41:57 aligns_to version 0.801
2023-08-02 20:41:57 hardware threads: 8, omp threads: 8
2023-08-02 20:42:09 loading time (sec) 11
2023-08-02 20:42:09 /tmp/tmp.K3zTXjBNA5/temp.fasta
2023-08-02 20:42:09 FastaReader
2023-08-02 20:42:09 0% processed
2023-08-02 20:42:11 100% processed
2023-08-02 20:42:11 total spot count: 171728
2023-08-02 20:42:11 total read count: 171728
2023-08-02 20:42:11 total time (sec) 13
29 spot(s) masked or removed.
Then with -s flag
$ sudo docker run -it -v $PWD:$PWD:rw -w $PWD us.gcr.io/ncbi-research-sra-dataload/sra-human-scrubber:2.2.0 /opt/scrubber/scripts/scrub.sh -s -i ~/fastq/SRR25478177.fastq
2023-08-02 20:42:31 aligns_to version 0.801
2023-08-02 20:42:31 hardware threads: 8, omp threads: 8
2023-08-02 20:42:31 loading time (sec) 0
2023-08-02 20:42:31 /tmp/tmp.MF630KCzZA/temp.fasta
2023-08-02 20:42:31 FastaReader
2023-08-02 20:42:31 0% processed
2023-08-02 20:42:33 100% processed
2023-08-02 20:42:33 total spot count: 85864
2023-08-02 20:42:33 total read count: 171728
2023-08-02 20:42:33 total time (sec) 1
16 spot(s) masked or removed
Finally with both -s and -x
$ sudo docker run -it -v $PWD:$PWD:rw -w $PWD us.gcr.io/ncbi-research-sra-dataload/sra-human-scrubber:2.2.0 /opt/scrubber/scripts/scrub.sh -s -x -i ~/fastq/SRR25478177.fastq
2023-08-02 20:49:44 aligns_to version 0.801
2023-08-02 20:49:44 hardware threads: 8, omp threads: 8
2023-08-02 20:49:44 loading time (sec) 0
2023-08-02 20:49:44 /tmp/tmp.wNMCIr09mb/temp.fasta
2023-08-02 20:49:44 FastaReader
2023-08-02 20:49:44 0% processed
2023-08-02 20:49:45 100% processed
2023-08-02 20:49:45 total spot count: 85864
2023-08-02 20:49:45 total read count: 171728
2023-08-02 20:49:45 total time (sec) 1
16 spot(s) masked or removed.
$ ls -l ~/fastq/
total 149044
-rwxr-xr-x 1 kskatz kskatz 76309278 Aug 2 20:15 SRR25478177.fastq
-rw-r--r-- 1 root root 76302482 Aug 2 20:49 SRR25478177.fastq.clean
Can you do the same and let me know? Thank you for your patience.
Hi, I did the same and obtain the same stdout results. However, these are not my concern, but rather the nr of lines/reads (wc -l / 4 ) remaining in the fastq file: no option: 686912 lines s flag: 686912 lines x flag: 686796 lines x + s flag: 686848 lines
The nr of lines/reads using both x and s flags is larger than the amount when only using x. Using s i expect the mates of the reads removed by x to also be removed; resulting in a lower line/read number.
Like I mentioned in a previous comment, I observed that starting in the exact middle of the file, the -s flag stops removing any reads (both masking or removing)
In the example i provided above: no option: 32 lines s flag: 32 lines x flag: 4 lines x + s flag: 16 lines expected 0
Reads in the example I provided above: no option (1 read remaining):
@SRR103937431289.1 1/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:CFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:BFFGGGGGGGGGGGG#:D#:AFGGGGGGGGGGGGGGGFGGGGGGGGGG#9DFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGFGDFGGGGGGGGGGGGGGGGGGGGGGGDGGGGFFFFFFFFF
@SRR103937431289.1 1/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG###::C<FGGGGGGGGGGGG##::FGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGFGGGGGGGGGGGGGGFGFGGGGGCFGGGGGGGGGGGGGGGGGFFFGGGGGGGGGGGGGEGGGGGGGGGCFGGGGEGGFGG#6@?6@DFGGGGGGDG>GGED#6@CG##+6=?DC@FAFGDF+;=FAFFF8#33@CFA?AFFEFFD==DF
@SRR103937431289.2 2/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
@CCCCGGFC6C@EEGGGG88CFFDFGGGG<EFDA@FFFGGDGDFGGGGGDFFE7FGGGGC<DFGGFCF,##::DFFG@FFFGGCFFFFGGGGGDEFFGGGG>FFGDFGGGGD#:AFGGGGGGGGGAFC+#:A=<FGGGGFGGGFGGCCCFGGGGGGFGGGGE7FFGGFGGDEGGFG@DEDECGE6D;3@;EFDDFGAF83@FEEE8CF9FGGGFAC,2=19EFG?C+C?CFGC+=@=++++4+=B+114B1
@SRR103937431289.2 2/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#88ACCFFFGGGFGFFDGCFGAFG<FGCFGC@,@cf@<9FC6FCAF###:6:CC@AFFFEFEGGGG##,:@CCE8FGGEGFGGFFG9<<AFEFDFG9E<9<@afeeef,ED9,FGDECFFFG9<=AFGG,AA=8?=FFGC;=EFDF9A7,E9F4==,>>,C>DFF@;>FBE,,,>DFGGADC#36@@?3CDGGCDAE=FAF?D#+0=+##+3+37+2+;6@F?A?A++1;6+#03:A++++**:AD91
@SRR103937431289.3 3/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE
@SRR103937431289.3 3/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF###::DDFGGGGGGGGGGGG##:CFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG9FF#4=FGGGGFGGGGGGCFGF5D#/2>:##/2>FFGGFFFCGFFFFFAFFFF#007FFF?FBFFFFFFBF:
@SRR103937431289.4 4/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=<ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@Egg,ACCCEGGE7ECGGGGG49*;DGG>6?CCFF>;8/888:B
@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))
s flag (4 reads remaining, bottom half, previously 3/4 masked by no option, expected 0):
@SRR103937431289.1 1/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:CFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG##:BFFGGGGGGGGGGGG#:D#:AFGGGGGGGGGGGGGGGFGGGGGGGGGG#9DFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGFGDFGGGGGGGGGGGGGGGGGGGGGGGDGGGGFFFFFFFFF
@SRR103937431289.1 1/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG###::C<FGGGGGGGGGGGG##::FGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGFGGGGGGGGGGGGGGFGFGGGGGCFGGGGGGGGGGGGGGGGGFFFGGGGGGGGGGGGGEGGGGGGGGGCFGGGGEGGFGG#6@?6@DFGGGGGGDG>GGED#6@CG##+6=?DC@FAFGDF+;=FAFFF8#33@CFA?AFFEFFD==DF
@SRR103937431289.2 2/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
@CCCCGGFC6C@EEGGGG88CFFDFGGGG<EFDA@FFFGGDGDFGGGGGDFFE7FGGGGC<DFGGFCF,##::DFFG@FFFGGCFFFFGGGGGDEFFGGGG>FFGDFGGGGD#:AFGGGGGGGGGAFC+#:A=<FGGGGFGGGFGGCCCFGGGGGGFGGGGE7FFGGFGGDEGGFG@DEDECGE6D;3@;EFDDFGAF83@FEEE8CF9FGGGFAC,2=19EFG?C+C?CFGC+=@=++++4+=B+114B1
@SRR103937431289.2 2/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#88ACCFFFGGGFGFFDGCFGAFG<FGCFGC@,@cf@<9FC6FCAF###:6:CC@AFFFEFEGGGG##,:@CCE8FGGEGFGGFFG9<<AFEFDFG9E<9<@afeeef,ED9,FGDECFFFG9<=AFGG,AA=8?=FFGC;=EFDF9A7,E9F4==,>>,C>DFF@;>FBE,,,>DFGGADC#36@@?3CDGGCDAE=FAF?D#+0=+##+3+37+2+;6@F?A?A++1;6+#03:A++++**:AD91
@SRR103937431289.3 3/1
TTGCTCGTGCATGTGGATATTTGCTCAGATTAAGAATGTTTCCCACAAAATTGCTTACTCTCTTTCATTNNGTCCGTGGTTGCTTGTGGTTTATTTGTTGATAACTGAGACTNTTTAAAGATTGCCAAANCAAAAGCCATTTCTTTTCTCCCCGAATAGCCTCATTTTTTCTATTCCCTTTTTCCCCTCCTATTAACATTTCTGGGGACTCTAAAGCAGAGCTCAAGATGGTTCAGGACCACGGAGAGCG
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE
@SRR103937431289.3 3/2
NAAAATTAGCTGGGCATGGTGGTGCACACCTGTAATCCCAGCTACTNNNGAGGCTGAGGCGCTAGANNCGCTTGGACCCGGGACGCGGAGGTTGAAGTGAGCCAAGATCGTGCTACTGCACTCCAGCCTGGGTGACAGAGCGAGACTCCCTCTCAAAAAAACAGAAAAATAAAAACAATTTTNGGATCATCTCTTTCCCCACCNGCATNNCTAAAGTGCTTTGGTAGTTCCCNTTCCTTGCGCTCTCCGTG
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF###::DDFGGGGGGGGGGGG##:CFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG9FF#4=FGGGGFGGGGGGCFGF5D#/2>:##/2>FFGGFFFCGFFFFFAFFFF#007FFF?FBFFFFFFBF:
@SRR103937431289.4 4/1
GACAAAGGACACAAAGTCTAGCTCCCTCCAAAGTCTCTGAGCCCCCAGCCTGCCTGGCATGGCCAAGGNNNGCCCACANNNNNCNGCTCACCAAGAACACAGAGGCACAGTNNCCTGGGTCCAAGAGAGNGTNGAGCTCTAAGAGAANATTTAAATATCNCANACCTGTATGTTCACCTTTATACCTATGCAGTCACAGAATAGAACTTTCAATATTTGTCAGGAGAGAGGGACCCAGGAAGGCAGACGGG
+
CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=<ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@Egg,ACCCEGGE7ECGGGGG49*;DGG>6?CCFF>;8/888:B
@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))
x flag (1 read remaining, same as without flag):
@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))
x + s flags (4 reads remaining, bottom half, previously 3/4 removed by x flag, expected 0):
@SRR103937431289.3 3/1
TTGCTCGTGCATGTGGATATTTGCTCAGATTAAGAATGTTTCCCACAAAATTGCTTACTCTCTTTCATTNNGTCCGTGGTTGCTTGTGGTTTATTTGTTGATAACTGAGACTNTTTAAAGATTGCCAAANCAAAAGCCATTTCTTTTCTCCCCGAATAGCCTCATTTTTTCTATTCCCTTTTTCCCCTCCTATTAACATTTCTGGGGACTCTAAAGCAGAGCTCAAGATGGTTCAGGACCACGGAGAGCG
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGG##::DFGGGGGGGGGGGGGGGGGGGGGGGGFFFGGGGGGGGGF#:?FGGGGGGGGGGGGG#:AFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGFGGGGGFFGGGGCECGGGGGGGGGFGGFGGGGGC2BDGGGGGFFGFGGFFAEFFFE
@SRR103937431289.3 3/2
NAAAATTAGCTGGGCATGGTGGTGCACACCTGTAATCCCAGCTACTNNNGAGGCTGAGGCGCTAGANNCGCTTGGACCCGGGACGCGGAGGTTGAAGTGAGCCAAGATCGTGCTACTGCACTCCAGCCTGGGTGACAGAGCGAGACTCCCTCTCAAAAAAACAGAAAAATAAAAACAATTTTNGGATCATCTCTTTCCCCACCNGCATNNCTAAAGTGCTTTGGTAGTTCCCNTTCCTTGCGCTCTCCGTG
+
#8ACCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF###::DDFGGGGGGGGGGGG##:CFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGDFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG9FF#4=FGGGGFGGGGGGCFGF5D#/2>:##/2>FFGGFFFCGFFFFFAFFFF#007FFF?FBFFFFFFBF:
@SRR103937431289.4 4/1
GACAAAGGACACAAAGTCTAGCTCCCTCCAAAGTCTCTGAGCCCCCAGCCTGCCTGGCATGGCCAAGGNNNGCCCACANNNNNCNGCTCACCAAGAACACAGAGGCACAGTNNCCTGGGTCCAAGAGAGNGTNGAGCTCTAAGAGAANATTTAAATATCNCANACCTGTATGTTCACCTTTATACCTATGCAGTCACAGAATAGAACTTTCAATATTTGTCAGGAGAGAGGGACCCAGGAAGGCAGACGGG
+
CCCCCGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGE8EFCGDFGGGGGEGFGGGFGGFFGGFG###::CFFG>#####6#:,:DEGFGGGGGGDGGFFFFGGGFGE##::BFGFFFGGGGGGGE#::#9+AD=<ECGGCGGC#84A@7FADFF;#3+#88ADFGGCFGGGGGGGGFFGGA,@DCFC99@D;FGGGGGGAAC,@Egg,ACCCEGGE7ECGGGGG49*;DGG>6?CCFF>;8/888:B
@SRR103937431289.4 4/2
NCCTAGAGGCGGCCAACATAGCATTTTTTTTTTTTGAAGTGAAATNNNNCTCTGTCGCCCGGTNTGNNGTGCAGTTGTGCGATCCCCCCCCACTGCAACCCCCGCCTCCTGGGTTCAAGCTATTCTCTGCCGCAGCCCCCCGCGTAGCGGGGTTTACATGTGCCCACCCACACGCCTGGCTANNNNTGGGATGTTCTTTTTTTNNTTNNNAGAGATGGGGTTTGACCGCCTTNNCTTGGAGTCTGTAGCA
+
#8BCCGGGEFFF7FECGGGGGGFGGGGCFGF@EC@+,6,C,,6,9####::,C,,<,C@F++8#,:##4:994:,,9C+C=+AF+::+=+336,A,,,,:++8>=F++,,3,3@+,,,,7,7@@df,7,>D>>F*>BF7141****:/2;0+++3<0<9C6*;/12*2110+####200++FG=C##1###**//**.77557965))07##)))0))./)6))).))
Increasing or decreasing the file size results in the same phenomenon with the s flag stopping to process reads past the middle of the file.
Kind regards
Indeed @mikelchtermans you have identified a bug and I will release a bug fix very soon. Thank you for your patience.
@mikelchtermans please see it is fixed in the new release tag (2.2.1)
Hello, I wanted just to report that when I scrub an interleaved fastq with the -x option, the resulting interleaved fastq is sometimes not paired anymore. This was unexpected as I saw that for paired files both reads would be masked if one of the two was detected as human. Could you fix this behavior in your next update? Best regards,