wdecoster / chopper

MIT License
135 stars 11 forks source link

Recommended defaults? #20

Closed jolespin closed 5 months ago

jolespin commented 9 months ago

In the example you have the following:

gunzip -c reads.fastq.gz | chopper -q 10 -l 500 | gzip > filtered_reads.fastq.gz

Is this what you would recommend for the default settings? I'm adding this to my VEBA software suite (https://github.com/jolespin/veba) so I want to include some decent defaults that could be used in most cases.

Thanks!

wdecoster commented 9 months ago

Hi,

I would say that the defaults are highly dependent on the application, e.g. Assembly vs reference alignment

jolespin commented 9 months ago

What about assembly?

wdecoster commented 9 months ago

I don't know if I dare to suggest defaults that would work fine in most scenarios, e.g. duplex will have higher accuracy than simplex sequencing, you could filter a little more stringent on length for ultra-long libraries... Could you let me know which data type to expect?

jolespin commented 9 months ago

The most practical use case would be genome assemblies for microeukaryotes like yeast. The second most common use case would be larger amplicons like full length 16S. Last use case would be actual community wide full metagenomes.

wdecoster commented 9 months ago

I can't propose defaults that would make sense in all those scenarios. This is difficult. I think the most correct would be to leave this to the end user, if possible.