Closed tarak77 closed 5 years ago
Hi @tarak77, For pile-up, one would cut out small regions (e.g. 20 Mb x 20 Mb) from the whole (genome-wide) contact map, and overlay ("pile up") them to obtain a single, small (20 Mb x 20 Mb) aggregate pile-up map.
In dip-c ard
, the (center) locations of these small regions are specified by -c
("reference"), and the (half-width) size by -d
. For example, you can pile up around all bulk Hi-C CTCF loops with -c bulk_loops.con
, which would be a typical analysis in bulk Hi-C papers.
In the Dip-C paper, however, we mainly piled up around all single-cell inter-chromosomal contacts, which would not be possible for bulk Hi-C data. This is the default behavior of dip-c ard
when -c
is not used.
The default output of dip-c ard
is a list of contacts from all the small regions. The small regions will be centered at (0, 0). Alternatively, to save your time on calculating a 2D histogram, -h
will output a calculated histogram for you with your specified bin size (e.g. 200 kb).
Does the above explanation make sense? Let me know if you have any questions.
So for every inter chromosomal contact point,
Am I right?
Also, what happens if the cut out box falls outside the contact map, i.e. for the inter chromosomal contact points near the edges?
Would you think that the final result of the analysis might change, when say, instead of typical round cells, we have cells which are spreaded, i.e. elliptical or flat?
Yes, that's all correct. It takes a few minutes to run, because there're ~200k inter-chromosomal contacts per cell (therefore ~200k boxes to cut).
Note that the default histogram calculation symmetrizes the histogram 8-fold (or 2-fold for intra-chromosomal), because there's no distinction between up and down, left and right, and the two chromosomes are interchangeable. This can be turned off by -S
.
Right now we don't intentionally avoid part of the box falling out of bound. Such boxes (partially empty) will still by centered at the reference point and included in the calculation. The same goes for intra-chromosomal references with boxes partially falling below the diagonal (dip-c stores contact maps as upper triangular; so nothing below the diagonal).
I don't think this analysis will change for flat cells. We have unpublished data on very flat cells. Contacts seem normal at first glance; just the 3D models are very flat.
Oh ok, I was thinking that when we say that a cell is "flat", may be the inter chromosomal contacts are far less than intra chromosomal? i.e. all chromosome territories can be seen on a plane? If possible, could you just share an image of a flat cell model. Just wondering how is "flatness" defined
I don't have an image at hand; but it basically has a thickness around 1/10 of it's diameter. So basically looks like a pancake under both microscope and as a 3D model.
Those cells clearly have many inter-chromosomal contacts; but this might just be a cell type thing rather than a flat thing. To study flatness properly, you should probably compare the same cell type but round versus flat. Perhaps such flattening can be achieved by applying physical force, or by surface adhesion.
Ok I see. thanks!
Hi @tanlongzhi , Previously I asked you about how you obtained the pileup of inter chromosomal contacts to see the ellipsoidal shape, which you then use for imputation. I know you told to use ard.py script for doing so. Would it be possible to explain what the script is doing in general? How are you defining a reference contact and how are you piling them up?
Thanks again!