s4hts / HTStream

A high throughput sequence read toolset using a streaming approach facilitated by Linux pipes
https://s4hts.github.io/HTStream/
Apache License 2.0
49 stars 9 forks source link

FASTQC replacement stats #106

Open dstreett opened 6 years ago

dstreett commented 6 years ago

Not sure everything that should be in here - but a place hold for this feature request.

msettles commented 4 years ago

partial solution implemented #183 with 'bases my cycle' and 'quality by cycle' matrices for each read added to the json output. Last thing I'd like to add before closing is something for "over-represented sequences" Idea is store all kmers found in the first N sequences, then count their occurrence in the whole dataset, print out anything that reaches a certain threshold of occurrence, say 0.1%. Parameters might be -k [ --kmer ] arg (=36)
-r [ --kmer-offset ] arg (=1) -n [--number_of_reads] arg(=500000) number of reads to establish kmer set -o [--occurence] arg(=0.001) The occurence of a kmer to output