wdecoster / chopper

MIT License
135 stars 11 forks source link

Add straightforward --input #29

Closed JMencius closed 4 months ago

JMencius commented 4 months ago

Hi @wdecoster I manage to modify the original code to implement the --input mentioned in issue #10. Breifly I just add an --input or -i to accept an input filename and transform to flow and input to filter function. I also test it with

cat {FILEPATH}/test.fastq | ./chopper -q 10 > testQ10_old.fastq;
Kept 207 reads out of 250 reads

./chopper -q 10 -i {FILEPATH}/test.fastq > testQ10_new.fastq;
Kept 207 reads out of 250 reads

These run give the same result. I hope you like my modifications.

wdecoster commented 4 months ago

Thank you! I had to add input: None to the Cli structs in the tests, but otherwise this looks good. Am I correct to think that this won't work for (gz) compressed input files?

JMencius commented 4 months ago

Yeah, it won't work for compressed input file. There are too many compressed format such as .gz, .tar.gz or .zip, for the compressed file, user can just use the pipeline for different compressed file.

JMencius commented 4 months ago

@wdecoster If you want such as chopper -i test.fastq.gz -q 10 > test_q10.fastq to work, maybe I will continue to add some code.

wdecoster commented 4 months ago

That would be great, as I would hope most people keep their fastqs compressed, and gz is definitely the most frequent compression format for such files. But as you said, anyone could just use stdin for things like that :-)

JMencius commented 4 months ago

@wdecoster I have added the support for .gz compressed file. I also use the fastq file in /test-data directory to run some test. The outcome of compressed file and not compressed file give the some results. Maybe you shall make some change the readme.md, when I was new to chopper I was confused for using pipeline for running chopper LOL. Hope theses modifications can make chooper more user-friendly.

wdecoster commented 4 months ago

Looks great, thanks so much!

JMencius commented 4 months ago

@wdecoster There is a typo in the EXAMPLE section I add in readme.md chopper -q 10 -l 500 -i reads.fastq.gz | gzip > filtered_reads.fast f q.gz Please do remember to modify it. Es tut mir leid.

incoherentian commented 2 weeks ago

Hi both! Only seeing this today and was excited to remove a couple pipes. Tried to no avail on an 8-core allocation, while piping still worked -

[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ module load miniforge3
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ source activate
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ conda activate chopper080
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ chopper --threads 8 -q 17 --headcrop 20 -i /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL/FBA18768_pass_barcode11_Q10_all.fastq.gz > /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL_merge/FBA18768_pass_barcode11_Q17_all.fastq.gz
Kept 0 reads out of 1 reads
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ ls
FBA18768_pass_barcode11_Q17_all.fastq.gz  FBA18768_pass_barcode11_Q19_all.fastq.gz
FBA18768_pass_barcode11_Q18_all.fastq.gz  FBA18768_pass_barcode11_Q20_all.fastq.gz
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ rm FBA18768_pass_barcode11_Q17_all.fastq.gz
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ gunzip -c /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL/FBA18768_pass_barcode11_Q10_all.fastq.gz | chopper --threads 8 -q 17 --headcrop 20 | gzip > /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL_merge/FBA18768_pass_barcode11_Q17_all.fastq.gz
Kept 164021 reads out of 562377 reads

Making a silly mistake here?

wdecoster commented 2 weeks ago

I wonder if that is fixed by a later PR that hasn't made it into a release, yet. I will post a new binary ere later today (I hope?) to debug this further...

wdecoster commented 2 weeks ago

Can you try with v0.9.0? https://github.com/wdecoster/chopper/releases/tag/v0.9.0

incoherentian commented 2 weeks ago

-i is working identically to the old pipes for me with 0.9.0. Thanks @wdecoster @JMencius and @sharkLoc too!

JMencius commented 2 weeks ago

Hi @wdecoster Would you mind change this expression in readme.md, since now the performance is similar between -i and linux pipe |.

wdecoster commented 2 weeks ago

Done!