natsuhiko / rasqual

Robust Allele Specific Quantification and quality controL
37 stars 20 forks source link

Direction of fragment count and allelic imbalance #31

Closed kwcurrin closed 5 years ago

kwcurrin commented 5 years ago

Hello,

Does RASQUAL expect that the haplotype with higher between-sample fragment count also be the haplotype with more reads in the allelic imbalance test?

Thanks!

Kevin

natsuhiko commented 5 years ago

Hi Kevin,

Yes, we expect the high-expression haplotype carries more reads at each feature SNP.

Best regards, Natsuhiko

kwcurrin commented 5 years ago

Thanks!

I wonder if TF footprints can violate the assumption that the fSNP allelic imbalance shows the same direction as the rSNP fragment count. In the original DNase QTL paper from Degner et al.: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3501342/

they show how the genotype with increased fragment count can actually have a dip in accessibility right at the SNP because of footprinting: "One intuitive mechanism for dsQTLs is that these may be caused by variants that strengthen or weaken individual transcription factor binding sites, thereby changing transcription factor affinity and local nucleosome occupancy (20-22) and hence DNaseI cut rates. Consistent with this model, an aggregated plot of DNaseI sensitivity at dsQTLs shows a distinct drop in chromatin accessibility around putatively causal SNPs that is reminiscent of transcription factor binding footprints, especially in the genotypes associated with high sensitivity (15-17).

Does RASQUAL penalize an association if the rSNP and fSNP show different directions?

kwcurrin commented 5 years ago

This seems like a unique feature of chromatin accessibility data. For gene expression and ChIP-seq data, it would make sense for the direction of between-sample fragment count and fSNP allelic imbalance to be the same.

natsuhiko commented 5 years ago

Hi Kevin,

I think you mix up DNaseI "cut site" and actual sequenced read. Although DNaseI cut frequency is depleted at the footprint, the footprint is usually very short. When you use 50-75bp sequenced read, you actually sequence the footprint and dsQTL SNP location. Therefore you still see the allelic imbalance at the dsQTL location. This is because we can use RASQUAL to map chromatin accessibility QTLs using ATAC-seq in the paper.

Best regards, Natsuhiko

kwcurrin commented 5 years ago

Hi Natsuhiko,

That is a good point. However, I do worry about particularly strong footprints dampening allelic imbalance, or possibly reversing direction in extreme cases. I could imagine a case where the footprint is 15 or 20 nt long and the dip in signal is large compared to the other allele. If read length is 50nt, this 15-20 nt footprint is a substantial fraction of the read coverage.

However, I do admit that this is probably not common. Most TFs don't even show footprints: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530758/

I also worry about cases where read coverage may not be very uniform outside of the motif, so that the footprint is not very well offset by the increase in flanking accessibility.

Kevin

kwcurrin commented 5 years ago

Hi Natsuhiko,

I spoke with a collaborator who extensively studies footprinting and he agrees with you on this. So I will close the issue.

Thanks!

Kevin