harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Fastqc or other fastq checks? #14

Closed tsackton closed 7 months ago

tsackton commented 2 years ago

At the moment, we don't run any pre-processing checks on fastq data, whether downloaded from SRA or provided locally, and the only preprocessing we do is adaptor trimming with fastp.

Conceivably we could add additional fastq QC checks to the preprocessing steps, but it is not clear how necessary or useful these are. Most data issues (data from the wrong species or otherwise contaminated, bad sequence quality) will be readily detectable by mapping problems, and it may be simpler and more robust to leave QC checks to that stage.

On the other hand, something like fastqc and a quick kmer coverage plot is easy to generate and could be a useful diagnostic in situations where there are problems.

Thoughts?