neurogenomics / EpiCompare

Comparison, benchmarking & QC of epigenetic datasets
https://doi.org/doi:10.18129/B9.bioc.EpiCompare
12 stars 3 forks source link

Get Picard summary metrics from <batch>/02_alignment/bowtie2/target/sample_name.stats file #118

Closed teemuronkko closed 1 year ago

teemuronkko commented 1 year ago

It seems like the newer versions (>= 2.0) of CUT&RUN pipeline do not produce the Picard summary file anymore when a batch is processed with duplicates. Seems like in older versions of the pipeline, this information was stored in the Picard summary file, but now this information is saved in /02_alignment/bowtie2/target/sample_name.stats file for each sample. Also, when the pipeline is run without duplicates, the stats file does not contain any information on the duplication rate. Would it be possible to include an option to retrieve the information of the duplication rate from this new version of CUT&RUN stats file as well?

teemuronkko commented 1 year ago

Seems like these metrics are always stored in //04_reporting/multiqc/multiqc_data/ in files multiqc_general_stats.txt and multiqc_picard_dups.txt (at least in CUT&RUN version 3.0) as the duplication rate information is included in the multiqc html report, too.

bschilder commented 1 year ago

Looking into this now.

bschilder commented 1 year ago

@teemuronkko can you paste the command you used here? nvm

bschilder commented 1 year ago

Old file types

gather_files(type="picard") would previously search for files matching this pattern (recursively):

"*.target.markdup.MarkDuplicates.metrics.txt$"

Example

Screenshot 2022-10-17 at 16 34 54

New files types

We now have several files that contain all or some of this information:

"multiqc_picard_dups.txt" "multiqc_general_stats.txt" "mqc_picard_deduplication_1.txt"

Examples

multiqc_general_stats.txt

multiqc_picard_dups.txt

mqc_picard_deduplication_1.txt

Conclusion

Of these new file types, "multiqc_picard_dups.txt" is the closest to the original file type. So I'm going to edit gather_files to search for this pattern instead.

bschilder commented 1 year ago

Ex