Closed LeeBergstrand closed 1 month ago
@LeeBergstrand Like I mentioned in #91 , this is a nice start toward making QC summary reports. Here are a few observations of mine after testing this code:
If you agree that this PR is a good way to address #91, then we can start to refine this PR including addressing the comments above. Thanks again!
@LeeBergstrand Like I mentioned in #91 , this is a nice start toward making QC summary reports. Here are a few observations of mine after testing this code:
- It would be ideal to also run FastQC on the raw reads (before input to the QC module)
If we want to run FastQC on intermediate QC steps like you have done in this draft code, we will need to modify the FastQC rules somehow so that each intermediate QC step is optional. Depending on how the user sets up their config file, some of the QC steps might be skipped, and I think this would produce an error in the current code. I can think of two different ways to make FastQC for intermediate steps optional:
- Keep the current fastqc rule but add more complex logic to each input variable
- Change the current fastqc rule so that it runs on each FastQ output file during QC, one file at a time, based on wildcards. In this case, the DAG would just figure out when to run the fastqc rule dynamically.
- Once ready, we should add another rule that runs MultiQC. Based on my quick testing, it would be ideal to generate separate MultiQC reports for short reads and long reads.
If you agree that this PR is a good way to address #91, then we can start to refine this PR including addressing the comments above. Thanks again!
Sounds good. I'll pursue this.
@jmtsuji This is essentially a preliminary pass of generating QC stats for Rotary.
Caveats
Does this look good enough to merge in for now?
@jmtsuji Do you have any comments or suggestions for improvements?
@jmtsuji Requesting merge of https://github.com/rotary-genomics/rotary/pull/173 (assembly stats using quast) into this branch.
@LeeBergstrand Thanks so much for these updates. Things are a bit crazy on my end at the moment, but I'll take a look at #161 and #173 by the end of this week. Let me know if a code review is urgent, and I can take a look sooner. Thanks!
@LeeBergstrand Thanks so much for these updates. Things are a bit crazy on my end at the moment, but I'll take a look at #161 and #173 by the end of this week. Let me know if a code review is urgent, and I can take a look sooner. Thanks!
@jmtsuji End of the week should be fine.
@jmtsuji Requesting merge of #173 (assembly stats using quast) into this branch.
@LeeBergstrand Approved -- feel free to merge this in. I can run an end-to-end test of the merged code in this branch once we're satisfied with the FastQC reports.
@jmtsuji Great!
My only concern is that
rule run_fastq_short
might error out if the user turns off one of the short read QC steps (e.g., adapter trimming) in the config file, becauseQC_SHORT_FILE_TYPES
is hard-coded to include'_reformat_', '_adapter_trim_', '_quality_trim_'
. Do you agree that this will likely be an issue? (I haven't tested the code to confirm, but I can do so if needed.)
This can be addressed by changing perform_adapter_trimming
into a variable flag and then using that to turn off _adaptertrim as an input. I'm considering creating a function to address creating inputs.
Add QC stats to Rotary.