wdecoster / chopper

MIT License
135 stars 11 forks source link

generation of a log file? #7

Closed brittanycronk closed 1 year ago

brittanycronk commented 1 year ago

Is the -c command the same as the -r for read removal in Nanolyse? I used it to remove host reads and the size of the resulting fastq.gz file was smaller, but no verbose messages indicated that any reads had been removed, like in Nanolyse. Can it generate a log file of how many reads were removed due to low quality, length, and host contamination? How would I add that in? I am not familiar with how to do this when the output is piped immediately to gzip.

Thanks for the help and for the sequencing tools!

wdecoster commented 1 year ago

Is the -c command the same as the -r for read removal in Nanolyse?

Yes that is correct.

Can it generate a log file of how many reads were removed due to low quality, length, and host contamination?

I agree that would be useful. Easiest to implement would be a total number of reads that were removed, less straightforward would be to identify for which reason a read got removed. I will put it on my to-do list.

Cheers, Wouter

brittanycronk commented 1 year ago

Hi Wouter,

Thank you for the information about -c and for working towards the read removal log file!

Thank you for sharing your programs.

Brittany

Brittany D. Cronk (she/they) Cornell University Animal Health Diagnostic Center Virology Laboratory @.**@.> (607)253-4342


From: Wouter De Coster @.> Sent: Sunday, March 26, 2023 1:51 PM To: wdecoster/chopper @.> Cc: Brittany Cronk @.>; Author @.> Subject: Re: [wdecoster/chopper] generation of a log file? (Issue #7)

Is the -c command the same as the -r for read removal in Nanolyse?

Yes that is correct.

Can it generate a log file of how many reads were removed due to low quality, length, and host contamination?

I agree that would be useful. Easiest to implement would be a total number of reads that were removed, less straightforward would be to identify for which reason a read got removed. I will put it on my to-do list.

Cheers, Wouter

— Reply to this email directly, view it on GitHubhttps://github.com/wdecoster/chopper/issues/7#issuecomment-1484170718, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG3OSEYPB54CSY463LD5PTLW6CFZVANCNFSM6AAAAAAWHAYIRQ. You are receiving this because you authored the thread.Message ID: @.***>

wdecoster commented 1 year ago

From version v0.4.0 the tool will write to stderr how many reads were kept from the total number of reads. You can redirect this to a file as below:

zcat reads.fastq.gz | chopper -q 12 2> chopper.log | gzip > filtered.fastq.gz

brittanycronk commented 1 year ago

Hi Wouter,

Thank you for the update! I hope you don't mind me troubleshooting further with you.

I updated chopper from 0.2.0 to 0.4.0 by re-installing using conda -c bioconda install chopper, however I am not sure if it is working correctly. When I filtered for quality and removed host reads, i kept more reads then when I ran filtering for quality alone. This should ideally be the other way around. Am I doing something wrong with my command? I see you wrote zcat in your previous email, but it seems to work the same regardless as to when I use gunzip or zcat.

gunzip -c ${samplename}.fastq.gz | chopper -q 10 -c /workdir/host_genomes/GCF_002263795.1_Bos_taurus_genomic.fna.gz 2> chopper.log | gzip > ${samplename}_filt.fastq.gz

the chopper.log file just read: Kept 214706 reads out of 214897 reads

vs zcat -c ${samplename}.fastq.gz | chopper -q 10 2> chopper4.log| gzip > ${samplename}_filt_test3.fastq.gz

chopper4 file reads the same Kept 186136 reads out of 214897 reads

I also ran it with the host again, after gunzip'ing the host file to see if that would make a different but the final result was still 214706 reads kept.

Thanks,

Brittany


From: Wouter De Coster @.> Sent: Sunday, April 2, 2023 4:42 PM To: wdecoster/chopper @.> Cc: Brittany Cronk @.>; Author @.> Subject: Re: [wdecoster/chopper] generation of a log file? (Issue #7)

From version v0.4.0 the tool will write to stderr how many reads were kept from the total number of reads. You can redirect this to a file as below:

zcat reads.fastq.gz | chopper -q 12 2> chopper.log | gzip > filtered.fastq.gz

— Reply to this email directly, view it on GitHubhttps://github.com/wdecoster/chopper/issues/7#issuecomment-1493435685, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG3OSE53NBF4ITUGA62TVLDW7HQD5ANCNFSM6AAAAAAWHAYIRQ. You are receiving this because you authored the thread.Message ID: @.***>

wdecoster commented 1 year ago

Oof, I identified a bug, and I will push a fix later tonight. Thanks for reporting!

wdecoster commented 1 year ago

I created v0.5.0, for which you can download a binary from the releases, but it will take a few hours before that makes its way into bioconda.