haowenz / chromap

Fast alignment and preprocessing of chromatin profiles
https://haowenz.github.io/chromap/
MIT License
189 stars 20 forks source link

[Feature Request] report number of duplicated fragments in bulk #145

Closed dawe closed 9 months ago

dawe commented 9 months ago

Hello and thanks once more for chromap. I know for scATAC-seq data, the output BED file contains the duplicate_count in 5th column. I wonder if it would be possible to report this information also for bulk analysis, possibly using the 4th column (currently unassigned and set to N). I'm asking this as I would like to use such information for some QC. Having the duplicate count for each fragment could be used to derive some ENCODE-like measures, but also to streamline decoratio analysis. Currently I do both by running chromap without any duplicate removal and count them later.

mourisl commented 9 months ago

Thank you for the suggestion. Indeed, the duplication number should be included in the output for the bulk data as well. We will implement this in the next few days.

mourisl commented 9 months ago

We have added the duplicate number to the last column of the BED file for bulk analysis in the li_dev5 branch. Could you please check out that branch and give it a try to see whether it works on your data? Thank you.

dawe commented 9 months ago

It works, thank you! Can I start using this version or should I wait for the next official release?

mourisl commented 9 months ago

This branch will be the next official release if no other significant bug is found. I'm currently waiting for the #143 . Once it is resolved, we will draft a new release based on li_dev5.

I think you can use this version for now if it is time-sensitive.