tanlongzhi / dip-c

Tools to analyze Dip-C (or other 3C/Hi-C) data
61 stars 18 forks source link

convert .3dg to pdb format? #34

Closed lidaof closed 5 years ago

lidaof commented 5 years ago

Hi there,

Nice tools. I am wondering is it possible to convert .3dg to pdb format? Thanks.

tanlongzhi commented 5 years ago

Thanks.

Yes, please use dip-c vis to convert .3dg files into the PDBx/mmCIF format (.cif). See this section of the README. This functionality requires the PDBx parser provided by the wwPDB website.

Note that this repo does not support conversion into the already retired, legacy PDB format (.pdb), because that format has a 99,999-atom limit and thus cannot represent high-resolution 3D genomes.

lidaof commented 5 years ago

Hi @tanlongzhi , thank you so much for the reply..yeah, i also found out that later after reading Readme again (sorry for that)

May I ask how the connections are determined from the 3dg file? I was thinking each of the line represents an atom in pdb format?

tanlongzhi commented 5 years ago

Each atom is a point in the genome (e.g. chr1: 0kb, 20kb, 40kb, 60kb, 100kb, 120kb, ...). Note that points might not be consecutive (such as skipping chr1: 80kb in the above example), because during dip-c clean3), points with too few contacts are removed.

The default behavior of dip-c vis is to make connections ("covalent bonds" in the PDBx/mmCIF format) between consecutive points, not when a point is skipped. In the above examples, connections will be chr1: 0kb--20kb--40kb--60kb, 100kb--120kb--....

You can also force dip-c vis to connect over skipped points with -a, which will lead to chr1: 0kb--20kb--40kb--60kb--100kb--120kb--.... in the above example.

lidaof commented 5 years ago

I see, very clear explanation! Thank you @tanlongzhi Another question, the 3D modeling using dip-c algorithm, is it based on whole genome data, or is modeled separate for each chromosome? seems all chromosomes together forms a sphere, i am wondering how the x, y, z of each chromosome was calculated?

tanlongzhi commented 5 years ago

No problem.

3D modeling is performed for the whole genome (all 46 chromosomes for human). We adopted a previous algorithm, nuc_dynamics by Stevens et al. (2017), developed for 3D modeling of a haploid mouse genome (20 chromosomes).

The algorithm creates attractive forces between all pairs of points that have a chromatin contact, and short-range repulsive forces between all pairs of points (to model the exclusion volume). Through simulated annealing, the algorithm produces genome-wide x, y, z coordinates that best satisfies the forces.

lidaof commented 5 years ago

Thank you @tanlongzhi