open2c / distiller-nf

A modular Hi-C mapping pipeline
MIT License
87 stars 24 forks source link

Skip deduplication for libraries with UMI #188

Open ChristopherBarrington opened 8 months ago

ChristopherBarrington commented 8 months ago

We have some DNase Hi-C data being produced that has UMI information to identify PCR duplicates. My intention would be to deduplicate the libraries using the FastQ files then submitting those duplicate-free FastQ to distiller.

Is there a method that you suggest using to preprocess these data with distiller? There doesn't look to be a straightforward option but I had a look at the DSL1 Nextflow script and thought that duplicating the merge_split process to avoid the deduplication step and create empty files for the expected duplicate-relevant files may work? The choice of process can then be controlled by --params.skip_dedup in a when directive.

I gave it a go and it seemed to work but I am worried that I will have missed something.