We have some DNase Hi-C data being produced that has UMI information to identify PCR duplicates. My intention would be to deduplicate the libraries using the FastQ files then submitting those duplicate-free FastQ to distiller.
Is there a method that you suggest using to preprocess these data with distiller? There doesn't look to be a straightforward option but I had a look at the DSL1 Nextflow script and thought that duplicating the merge_split process to avoid the deduplication step and create empty files for the expected duplicate-relevant files may work? The choice of process can then be controlled by --params.skip_dedup in a when directive.
I gave it a go and it seemed to work but I am worried that I will have missed something.
We have some DNase Hi-C data being produced that has UMI information to identify PCR duplicates. My intention would be to deduplicate the libraries using the FastQ files then submitting those duplicate-free FastQ to distiller.
Is there a method that you suggest using to preprocess these data with distiller? There doesn't look to be a straightforward option but I had a look at the DSL1 Nextflow script and thought that duplicating the
merge_split
process to avoid the deduplication step and create empty files for the expected duplicate-relevant files may work? The choice of process can then be controlled by--params.skip_dedup
in awhen
directive.I gave it a go and it seemed to work but I am worried that I will have missed something.