Closed tarak77 closed 6 years ago
Regarding the future, there're a few ways in which the chemistry might be improved. We and other labs are working on some.
On the computational front, I believe there's a lot of room for improvement:
Varoquaux model
for generating Hi-C data from 3D structure, ij_th
interaction count is linked with the ij_th
pairwise distance via the probability model:C_ij ~ Poisson(b||x_i - x_j ||^a)
Here we take a=-3 and b>0. The a value comes from the literature:
The contact count is inversely proportional to the genomic distance (c ~ s^−1), whereas the volume scales linearly with the subchain length (d^3 ~ s), from which we deduce a relationship between d and c of the form (d ~ c^−1/3).
I was thinking that for the inter chromosomal Hi-C data generation, should we use the same a? or consider
The contact count is inversely proportional to the genomic distance (c ~ s^−2), whereas the volume scales linearly with the subchain length (d^3 ~ s), from which we deduce a relationship between d and c of the form (d ~ c^−1/6).
Okay
Yes, it makes sense now. Thanks!
An algorithm for imputation without SNP data? How will one go about that? Any reference I could use would be awesome!
Again, how does one define ||x_i - x_j|| for interchromosomal contacts, for example between chr1:10Mb and chr5:20Mb?
I mentioned to you this paper in a different thread. The authors didn't use SNP data but managed get structures for one chromosome. I imagine a similar algorithm might work for the whole genome.
I could be wrong, but in general Lets say we have the 3D genome wide coordinates from a haploid cell. We could find the distance matrix using euclidean distances, and then cut the matrix to get the interchromosomal distances between chr1 and chr5. From this cut matrix, to reconstruct the contacts, we could use the Varoquaux model with exponent a to be -6 instead of -3?
You're talking about euclidean distances (in nanometers) while those models are using genomic distances (in basepairs).
Sorry about the confusion, in the Varoquaux model https://github.com/hiclib/pastis/blob/master/examples/plot_generate_data.py , to reconstruct the genome wide contact matrix euclidean distances are raised to the power -3. I was wondering should it be different for inter chromosomal matrices? like -6(kinda based on above logic)?
Neither hickit nor nuc_dynamics uses a Poisson model. If you like a Poisson model, try another tool. There are plenty of them.
I see what you meant. You want to infer 3D distance from the number of contacts (from either bulk or single-cell Hi-C) between two particles. This power law is assumed to be -1/3 in this line of code of nuc_dynamics, regardless of intra or inter.
I don't have a strong opinion on this matter, or power laws in general. There have been several experimental studies on the relationship between 3D distance and the number of intrachromosomal contacts in bulk Hi-C: Wang et al. 2016 and Fudenberg & Imakaev 2017, to name a few. The relationship between 3D distance and the number of interchromosomal contacts in bulk Hi-C (which would be a good research project), or the number of any contacts in single-cell Hi-C (which of course would require imaging and doing Hi-C on the same cell), remains an open problem.
Right, I understand now.
Aiden et al. (2009)
showing the existence of chromosome territories(2A) and the inter chromosomal contacts(2B)
and the corresponding supplementary text
Presence of Chromosome Territories. The total number of possible interactions at a given genomic distance was computed explicitly for each chromosome and compared to the actual number of interactions at that distance. (The possible number of pairs of genomic positions separated by d on a given chromosome is Lc-d, where Lc is the length of the chromosome.) To obtain the interchromosomal averages, the number of observed interactions between loci on a pair of chromosomes was divided by the number of possible interactions between the two chromosomes (the product of the number of loci on each chromosome). When multiple chromosome pairings were being averaged, such as in the computation of In(s), the numerators and denominators were summed independently. The genome wide average, I(s), is therefore the result of dividing the total number of interactions at a distance s by the number of possible interactions at distance s summed over all chromosomes.
Proximity of Chromosome Territories. The expected number of interchromosomal interactions for each chromosome pair i,j was computed by multiplying the fraction of interchromosomal reads containing i with the fraction of interchromosomal reads containing j and multiplying by the total number of interchromosomal reads. The enrichment was computed by taking the actual number of interactions observed between i and j and dividing it by the expected value.
I quite don't follow the inter chromosomal contact probability computation?
shows that the single chromosome model shows a more complicated packing than the fractal globule. I wonder how they generated that plot?
Thanks again!
Glad to hear.
Your two additional questions may be best answered by the authors of the two papers, respectively.
Yes I will do that, thanks!
Sorry for going back and forth between theories but I had some general questions:
The statistical property for
inter-chromosomal/ long range intra-chromosomal
contacts- explained is definitely convincing, but can we show the same via the plot betweencontact probability Vs genomic distance
? I know that for intra-chromosomal- we should expect a slope of -1(fractal globule case), for inter-chromosomal- should be expect a slope of -2? How to find the contact probability in this case?Coming back to https://github.com/tanlongzhi/dip-c/issues/4#issuecomment-420307166 , you remove short range contacts with unknown haplotypes on both legs. Won't the either or both strategy help in producing more better 3D structures?
In Dip-C, Imputation in done in 2D and then 3D. IN the supplementary material you mentioned
I am a bit confused with the second shortest 3D distance? Could you explain the 3D imputation in general?
--> Thinking along the same lines, are there any limitations in the current methods you think might be improved on? In the Deep learning age, can it help current 3D/4D genome modeling and analysis even further?
Thanks again, I really appreciate it!