Open khoahoc0508 opened 1 year ago
The QC pipeline just runs FASTQC and samtools stats/plot-bamstats. You could run these commands, but they are nothing special, just wrappers around those programs:
clockwork samtools_qc reference.fasta reads.1.fastq reads.2.fastq output_dir
clockwork fastqc outdir reads.1.fastq reads.2.fastq
Thank you, @martinghunt; it works flawlessly. Now I can entirely switch new version. Anyway, could you advise on minimum quality requirements for input pair-end files? I am still confused about this.
Sincerely, Trung
How you decide a sample is bad and remove it is up to you :) There's no set method of doing so and it depends on what analysis you're doing.
You could remove samples up front, eg if (making up example numbers) <90% of the genome has coverage >20X. Or if a low % of reads map or the reads are low quality (eg error rate from samtools).
You could remove samples after variant calling, eg for TB if a sample has >10k variants, or if it has a lot of "heterozygous" calls (both those things suggest contamination).
Thanks very much, @martinghunt. These recommendations are helpful for me. I already used clockwork when it was a part of sp3 platform developed by Oxford University, but now this platform is going down, so I follow step by step their workflows, but something I can not handle.
Sincerely, Trung
Hello, I want to update the new version, but I can not see how to QC when running only the script without tracking the database. I want to use this as an older version (FastQC and Samtools QC). Please give me a guide so I can QC my data before analysis.
Sincerely, Trung