marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
502 stars 126 forks source link

cutadapt cannot read from stdin with xopen 2.0.0 #774

Closed peterjc closed 3 months ago

peterjc commented 3 months ago

Working with xopen 1.9.0 (and older), running here on macOS:

$ python --version
Python 3.10.12
$ cutadapt --version
4.7
$ python -c "import xopen; print(xopen.__version__)"
1.9.0
$ python -c "import dnaio; print(dnaio.__version__)"
1.2.0

Using sample file tests/ncbi-import/multiple_hmm.fasta with this command which outputs 5 of FASTA entries via stdin:

$ cat tests/ncbi-import/multiple_hmm.fasta | cutadapt -a GYRGGGACGAAAGTCYYTGC /dev/stdin | grep -c "^>"
This is cutadapt 4.7 with Python 3.10.12
Command line parameters: -a GYRGGGACGAAAGTCYYTGC /dev/stdin
Processing single-end reads on 1 core ...
Done           00:00:00             5 reads @ 846.8 µs/read;   0.07 M reads/minute
Finished in 0.006 s (1144.028 µs/read; 0.05 M reads/minute).

=== Summary ===

Total reads processed:                       5
Reads with adapters:                         4 (80.0%)
Reads written (passing filters):             5 (100.0%)

Total basepairs processed:         4,631 bp
Total written (filtered):          1,332 bp (28.8%)

=== Adapter 1 ===

Sequence: GYRGGGACGAAAGTCYYTGC; Type: regular 3'; Length: 20; Trimmed: 4 times

Minimum overlap: 3
No. of allowed errors:
1-9 bp: 0; 10-19 bp: 1; 20 bp: 2

Bases preceding removed adapters:
  A: 0.0%
  C: 0.0%
  G: 0.0%
  T: 100.0%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
370 1   0.0 2   1
616 1   0.0 2   0 1
932 1   0.0 2   1
1381    1   0.0 2   1
5

Broken when update to xopen 2.0.0 (released 2024-03-26 https://pypi.org/project/xopen/#history - yesterday):

$ cutadapt --version
4.7
$ python -c "import dnaio; print(dnaio.__version__)"
1.2.0
$ python -c "import xopen; print(xopen.__version__)"
2.0.0
$ cat tests/ncbi-import/multiple_hmm.fasta | cutadapt -a GYRGGGACGAAAGTCYYTGC - | grep -c "^>"
This is cutadapt 4.7 with Python 3.10.12
Command line parameters: -a GYRGGGACGAAAGTCYYTGC -
Processing single-end reads on 1 core ...

No reads processed!
0

Also using /dev/stdin is broken:

$ cat tests/ncbi-import/multiple_hmm.fasta | cutadapt -a GYRGGGACGAAAGTCYYTGC /dev/stdin | grep -c "^>"
This is cutadapt 4.7 with Python 3.10.12
Command line parameters: -a GYRGGGACGAAAGTCYYTGC /dev/stdin
Processing single-end reads on 1 core ...

No reads processed!
0

This might be related to #772, but the timing doesn't fit with xopen 2.0.0 being released yesterday.

rhpvorderman commented 3 months ago

Something is broken indeed:

(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ pip list | grep xopen
xopen              2.0.0
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ wc -l ~/test/5millionreads_R1.fastq
20000000 /home/rhpvorderman/test/5millionreads_R1.fastq
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("/dev/stdin", "rt"); print(f.read())' | wc -l
19999956
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("-", "rt"); print(f.read())' | wc -l
19999956
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ pip install xopen==1.9.0 >/dev/null
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("-", "rt"); print(f.read())' | wc -l
20000001
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("/dev/stdin", "rt"); print(f.read())' | wc -l
20000001
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ 

Xopen 1.9.0 performs as it should (the extra newline is added by print)

xopen2.0.0 misses some data? Which is really weird as all the xopen tests pass. I will see if I can fix this issue.

rhpvorderman commented 3 months ago

I will yank the 2.0.0 release. This is quite serious. Ping @marcelm

peterjc commented 3 months ago

Ah - I hadn't taken the next step of seeing if this was an xopen bug vs cutadapt needing a tweak for an xopen change.

Let's close this and focus on https://github.com/pycompression/xopen/issues/157

I've excluded v2.0.0 on my development branch so my CI works, but that is only a stopgap:

https://github.com/peterjc/thapbi-pict/commit/5b7466da9d3cb056177c627150b8a3ab4f42f109

Yanking the xopen 2.0.0 release seems prudent, thanks - this is more serious that just a cutadapt issue as I first assumed 👍