Generating pile up of inter chromosomal contacts

tarak77 commented 5 years ago

Hi @tanlongzhi , Previously I asked you about how you obtained the pileup of inter chromosomal contacts to see the ellipsoidal shape, which you then use for imputation. I know you told to use ard.py script for doing so. Would it be possible to explain what the script is doing in general? How are you defining a reference contact and how are you piling them up?

Thanks again!

tanlongzhi commented 5 years ago

Hi @tarak77, For pile-up, one would cut out small regions (e.g. 20 Mb x 20 Mb) from the whole (genome-wide) contact map, and overlay ("pile up") them to obtain a single, small (20 Mb x 20 Mb) aggregate pile-up map.

In dip-c ard, the (center) locations of these small regions are specified by -c ("reference"), and the (half-width) size by -d. For example, you can pile up around all bulk Hi-C CTCF loops with -c bulk_loops.con, which would be a typical analysis in bulk Hi-C papers.

In the Dip-C paper, however, we mainly piled up around all single-cell inter-chromosomal contacts, which would not be possible for bulk Hi-C data. This is the default behavior of dip-c ard when -c is not used.

The default output of dip-c ard is a list of contacts from all the small regions. The small regions will be centered at (0, 0). Alternatively, to save your time on calculating a 2D histogram, -h will output a calculated histogram for you with your specified bin size (e.g. 200 kb).

Does the above explanation make sense? Let me know if you have any questions.

tarak77 commented 5 years ago

So for every inter chromosomal contact point,

the algorithm cuts out a box around it,
then piles them up while centered at those inter chromosomal contact points,
finally calculates a 2D histogram

Am I right?

Also, what happens if the cut out box falls outside the contact map, i.e. for the inter chromosomal contact points near the edges?

Would you think that the final result of the analysis might change, when say, instead of typical round cells, we have cells which are spreaded, i.e. elliptical or flat?

tanlongzhi commented 5 years ago

Yes, that's all correct. It takes a few minutes to run, because there're ~200k inter-chromosomal contacts per cell (therefore ~200k boxes to cut).

Note that the default histogram calculation symmetrizes the histogram 8-fold (or 2-fold for intra-chromosomal), because there's no distinction between up and down, left and right, and the two chromosomes are interchangeable. This can be turned off by -S.

Right now we don't intentionally avoid part of the box falling out of bound. Such boxes (partially empty) will still by centered at the reference point and included in the calculation. The same goes for intra-chromosomal references with boxes partially falling below the diagonal (dip-c stores contact maps as upper triangular; so nothing below the diagonal).

I don't think this analysis will change for flat cells. We have unpublished data on very flat cells. Contacts seem normal at first glance; just the 3D models are very flat.

tarak77 commented 5 years ago

Oh ok, I was thinking that when we say that a cell is "flat", may be the inter chromosomal contacts are far less than intra chromosomal? i.e. all chromosome territories can be seen on a plane? If possible, could you just share an image of a flat cell model. Just wondering how is "flatness" defined

tanlongzhi commented 5 years ago

I don't have an image at hand; but it basically has a thickness around 1/10 of it's diameter. So basically looks like a pancake under both microscope and as a 3D model.

Those cells clearly have many inter-chromosomal contacts; but this might just be a cell type thing rather than a flat thing. To study flatness properly, you should probably compare the same cell type but round versus flat. Perhaps such flattening can be achieved by applying physical force, or by surface adhesion.

tarak77 commented 5 years ago

Ok I see. thanks!

tanlongzhi / dip-c

Generating pile up of inter chromosomal contacts #33