haowenz / chromap

Fast alignment and preprocessing of chromatin profiles
https://haowenz.github.io/chromap/
MIT License
192 stars 21 forks source link

multi-mapped reads #154

Open lindsayhrlee opened 8 months ago

lindsayhrlee commented 8 months ago

Hi,

For the Hi-C data mapping, is there a way to output multi-mapped reads (bolded in the log below) separately? Or is there a way to extract only multi-mapped reads from the .pairs file? Thank you!

Number of reads: 191422464. Number of mapped reads: 179308834. Number of uniquely mapped reads: 53537168. Number of reads have multi-mappings: 125771666.

Best, Lindsay

mourisl commented 8 months ago

The multi-mapped reads will have mapq 0, so they will be directly filtered in Chromap's output. The default pair format does not contain MAPQ field, so it might be hard to identify multi-mapped reads based on mapq, where you can use -q 0 to include alignments with mapq 0.

lindsayhrlee commented 8 months ago

Thank you so much for your response. Couple more question.

1) Is there a way to speed up chromap? Can I run this in parallel? If yes, how much memory and nodes should I ask for?

2) Is there a way to get the mapping quality for both ends for the HiC data?

mourisl commented 8 months ago
  1. You can use multithreading (-t num_of_threads). It will not increase memory consumption, and the node is one.
  2. The mapping quality is with respect to the fragment, so you can't get the information for one read end.