open2c / pairtools

Extract 3D contacts (.pairs) from sequencing alignments
MIT License
104 stars 32 forks source link

All pairs are corrupt ("XX") #221

Open lu-r-lu opened 8 months ago

lu-r-lu commented 8 months ago

Hello,

A colleague and myself are stuck trying to figure out why are we getting the dreaded XX quality indicator for all our pairs when we try to run: samtools view -h file1.hicup.bam | pairtools parse -c hg38.simple-chrom.sizes -o parsed_file1.pairsam.gz

I have ran this line exactly like that on another bam file (from another experiment though) and it worked fine, so we were wondering if something in the bam file might be corrupt/wrong/etc? It is not sorted (we checked) and the format of the chromosome names is consistent. Quality wise, we would say the fastq file is okay quality, not fantastic, but fine, aligned okay. What else can it be?

P.S. The pairs all look like the following few lines: image

All suggestions are sincerely appreciated!

golobor commented 8 months ago

it seems that you're missing either sam1 or sam2. This means that your .sam entries for R1 and R2 got unpaired from one another for whatever reason. Could you check the content of file1.hicup.bam - do you see pairs of alignments for each read there?

lu-r-lu commented 8 months ago

@golobor Thank you so much for the reply. I think I have both SAM1 and SAM2. We have tested it this way (hopefully the right way):

$ samtools view -F 0x4 file1.hicup.bam | awk '{ if(and($2, 64)) count1++; else count2++ } END { print "SAM1 count:", count1; print "SAM2 count:", count2 }' SAM1 count: 54353243 SAM2 count: 54353243

Any thoughts?

golobor commented 8 months ago

Could you show the first few alignments?

lu-r-lu commented 8 months ago

Let me know if this is helpful and if I am missing something, of course! TY

image

golobor commented 8 months ago

for some reason, you seem to have one alignment per readID, whereas pairtools parse expects at least two alignments per readID to identify contacts. I do not have enough info to tell you why this happened. One simple option to try is to sort the .bam file by readID - maybe the alignments are there but got de-syncronized for whatever reason?

lu-r-lu commented 8 months ago

You were right about the odd sorting, it seems that this is how the files came out of hicup, as far as I was told.

I've done the following and then it worked fine: samtools sort -n file1.hicup.bam -o sortedID_file1.hicup.bam

Thank you, really appreciate it!