Closed jiangshan529 closed 1 year ago
hi, @jiangshan529
Firstly, Hi-C is a paired-end sequencing method; alignments in a pair can originate from the opposite sides of your DNA molecule. See explanations in our docs on bam parsing.
Next, the orientation of the alignments is recoded as SAM tags in the second field of your sam/bam file, and you can decode them, e.g., here: https://broadinstitute.github.io/picard/explain-flags.html
In your case:
So two alignments you report are already mapped to the opposite strands of DNA. IGV reports something else, but this might be some IGV convention of reporting for paired-end reads, which is unrelated to conventions of Hi-C data interpretation.
Finally, let's dig into what standard pairtools parse
is doing (and I assume you did not change reporting orientation settings, right?)
It reports two alignments as a pair, and it will be something like: SRR710074.10273713 chr1 1054648 chr1 1054204 - + UU
However, the left alignment has a larger coordinate than the right one, and pairtools
by default performs the flipping procedure so that the alignment with smaller coordinate is always first. So the pair will look like this:
SRR710074.10273713 chr1 1054204 chr1 1054648 + - UU
Let me know if you have further questions. This is also a discussion rather than an issue. I will transfer it there for you, but I appreciate it if this kind of help request would go into discussion directly next time. Thanks.
Hello, I am trying to understand the details about how pairtools works. In my pair.gz file, I choose a read pair to do the test, the read pair looks like this:
Then I searched the line number in .bam file and found:
I use the blat tool in IGV genome browser and found actually the two reads are both on + strand, not as shown in the pair.gz file that one is on + strand and another on - strand. So how should I understand this?
I also searched 'UU' in the pair.gz file and do blat for some 'UU' pairs, and some of them can be blasted to several sites on the hg38 genome. And it is not what defined by 'UU'. So how should I interpret this? Thanks for your help!