Closed lihaicheng7003 closed 2 months ago
gunzip -c 202404181017140_read.fq.gz|chopper -q 10 -l 500 | gzip >202404181017140_read.filter.fq.gz
This command work
Additional information:
202404181017140_read.fq.gz: gzip compressed data, extra field, last modified: Sat Apr 20 13:45:16 2024, max compression
Do you mean that the counting of reads is wrong, or do you also get a different set of reads passing the filter for the second command?
using
chopper -q 10 -l 500 --threads 8 -i 202404181017140_read.fq.gz |gzip > 202404181017140_read.filter.fq.gz
return
Kept 3 reads out of 3 reads
there are only 3 reads in 202404181017140_read.filter.fq.gz. I confirm that there are many reads in the 202404181017140_read.fq.gz.
I'm confused what happen.
After this command, there are only three reads in the 202404181017140_read.filter.fq.gz, which is obviously wrong. 202404181017140_read.fq.gz has about two million reads.
gunzip -c 202404181017140_read.fq.gz|chopper -q 10 -l 500 | gzip >202404181017140_read.filter.fq.gz
This command work
Additional information:
202404181017140_read.fq.gz: gzip compressed data, extra field, last modified: Sat Apr 20 13:45:16 2024, max compression
I didn't do anything(same file, same environment), just changed the command and got 1.98 million reads in the 202404181017140_read.filter.fq.gz, and this result should be correct
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
_sysroot_linux-64_curr_repodata_hack 3 haa98f57_10 defaults
binutils 2.40 h4852527_0 conda-forge
binutils_impl_linux-64 2.40 ha885e6a_0 conda-forge
binutils_linux-64 2.40 hdade7a5_3 conda-forge
biopython 1.79 pypi_0 pypi
c-compiler 1.7.0 hd590300_1 conda-forge
ca-certificates 2024.3.11 h06a4308_0 defaults
certifi 2021.5.30 py36h06a4308_0 defaults
chopper 0.8.0 hdcf5f25_0 bioconda
clang 14.0.6 h06a4308_1 defaults
clang-14 14.0.6 default_hc6dbbc7_1 defaults
cxx-compiler 1.7.0 h00ab1b0_1 conda-forge
gcc 12.3.0 h915e2ae_7 conda-forge
gcc_impl_linux-64 12.3.0 h58ffeeb_7 conda-forge
gcc_linux-64 12.3.0 h6477408_3 conda-forge
gxx 12.3.0 h915e2ae_7 conda-forge
gxx_impl_linux-64 12.3.0 h2a574ab_7 conda-forge
gxx_linux-64 12.3.0 h4a1b8e8_3 conda-forge
kaleido 0.2.1 pypi_0 pypi
kernel-headers_linux-64 3.10.0 h57e8cba_10 defaults
ld_impl_linux-64 2.40 h55db66e_0 conda-forge
libclang-cpp14 14.0.6 default_hc6dbbc7_1 defaults
libffi 3.3 he6710b0_2 defaults
libgcc-devel_linux-64 12.3.0 h0223996_107 conda-forge
libgcc-ng 13.2.0 h77fa898_7 conda-forge
libgomp 13.2.0 h77fa898_7 conda-forge
libllvm14 14.0.6 hef93074_0 defaults
libsanitizer 12.3.0 hb8811af_7 conda-forge
libstdcxx-devel_linux-64 12.3.0 h0223996_107 conda-forge
libstdcxx-ng 13.2.0 hc0a3c3a_7 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
nanoget 1.19.1 pypi_0 pypi
nanomath 1.3.0 pypi_0 pypi
nanoplot 1.42.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
numpy 1.19.5 pypi_0 pypi
openssl 1.1.1w h7f8727e_0 defaults
packaging 21.3 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
pip 21.2.2 py36h06a4308_0 defaults
plotly 5.18.0 pypi_0 pypi
pyarrow 6.0.1 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
pysam 0.22.1 pypi_0 pypi
python 3.6.13 h12debd9_1 defaults
python-dateutil 2.9.0.post0 pypi_0 pypi
python-deprecated 1.1.0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
scipy 1.5.4 pypi_0 pypi
setuptools 58.0.4 py36h06a4308_0 defaults
six 1.16.0 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0 defaults
sysroot_linux-64 2.17 h57e8cba_10 defaults
tenacity 8.2.2 pypi_0 pypi
tk 8.6.14 h39e8969_0 defaults
wheel 0.37.1 pyhd3eb1b0_0 defaults
xz 5.4.6 h5eee18b_1 defaults
zlib 1.2.13 hd590300_5 conda-forge
Hi @lihaicheng7003 Sorry for this issue. Can you share part of your input fastq.gz file to my email zjmeng22@m.fudan.edu.cn, so I can help you figure out why.
Hi, I try to replicate your error using the built-in test.fastq
in chopper/test-data/
, but failed. To be notice, I use cargo build
to build the chooper
, but I don't think that makes any difference.
Following is my result:
$ ./chopper -q 10 -l 500 -i /test-data/test.fastq > a.fastq
Kept 205 reads out of 250 reads
$ ./chopper -q 10 -l 500 -i /test-data/test.fastq.gz > b.fastq
Kept 205 reads out of 250 reads
$ gunzip -c /test-data/test.fastq.gz | ./chopper -q 10 -l 500 > c.fastq
Kept 205 reads out of 250 reads
$ gunzip -c /test-data/test.fastq.gz | ./chopper -q 10 -l 500 |gzip > d.fastq.gz
Kept 205 reads out of 250 reads
I also check the exact length of each output fastq.
$ wc -l *.fastq
820 a.fastq
820 b.fastq
820 c.fastq
820 d.fastq
I am confused too, and please do send your file if there is no privacy concern.
I found the reason. The file I was using (202404181017140_read.fq.gz) may have been compressed using some special compression method, causing Chopper to fail to parse it correctly. gzip can decompress this file, so the second command works as expected.
Hi @lihaicheng7003 Sorry for this issue. Can you share part of your input fastq.gz file to my email zjmeng22@m.fudan.edu.cn, so I can help you figure out why.
Sorry, I can't provide the complete file as it's too large. Since I don't know what method or tool was used to compress it, I also can't compress a small portion of reads into a similar format. It may not be possible to provide a test file.
Thank you for your response. The situation I encountered isn't a problem with Chopper, , it's a problem with my file format.
using
return
there are only 3 reads in 202404181017140_read.filter.fq.gz. I confirm that there are many reads in the 202404181017140_read.fq.gz.
I'm confused what happen.