tanlongzhi / dip-c

Tools to analyze Dip-C (or other 3C/Hi-C) data
61 stars 18 forks source link

Highlighting centromeres in the model? #20

Closed tarak77 closed 5 years ago

tarak77 commented 5 years ago

Hey @tanlongzhi, Is there a way to highlight the centromere positions in the movie of the 3D models for human/mouse? I ask that because it may help us determine the cell cycle phase.

tanlongzhi commented 5 years ago

Hey @tarak77,

Yes. Below I wrote some rough code for you (using human as an example); not tested though.

  1. Extract centromere coordinates from the 3rd column of the provided file color/hg19.chr.cen, and convert it into a .leg file with two lines per chromosome (for the two haplotypes):

    awk 'BEGIN{FS="\t"}{print $1","$3",0";print $1","$3",1"}' color/hg19.chr.cen > hg19.cen.hom.leg
  2. Find the 3D positions of all centromeres from a .3dg file with dip-c pos -l:

    dip-c pos -l hg19.cen.hom.leg cell.3dg > cell.cen.pos
  3. Give each centromere a name (string) and a color (real value):

    awk 'BEGIN{FS="\t"}{print $1"(pat)";print $1"(mat)"}' color/hg19.chr.cen > hg19.cen.hom.name
    awk 'BEGIN{FS="\t"}{print $1;print $1}' color/hg19.chr.cen | sed 's/X/23/g; s/Y/24/g' > hg19.cen.hom.color
  4. Generate an mmCIF file with the provided script scripts/name_color_x_y_z_to_cif.py:

    paste hg19.cen.hom.name hg19.cen.hom.color cell.cen.pos | python scripts/name_color_x_y_z_to_cif.py /dev/stdin > cell.cen.n.cif

The final .cif file can be overlaid in PyMol onto your existing .cif file of the cell. You can then format the appearance of the centromere .cif file to highlight it.

tarak77 commented 5 years ago

Awesome! Quick question:

  1. How did you obtain the centromere coordinates? I have been searching for mm10 centromere coordinates on UCSC browser but no luck yet! Any help?
tanlongzhi commented 5 years ago

I think I used UCSC Genome Browser's Table Browser, and selected something like "assembly gaps". In the gap file, some lines correspond to centromeres, with start and end coordinates. I simply picked the midpoints.

If that doesn't work for you, alternatively, for mouse you can probably just use zero as all centromere coordinates.

tanlongzhi commented 5 years ago

Oh I didn't realized that I forgot to upload my mm10 centromere files, used in the publication. I just updated this repo (cb19e6ad1c4e7a18c3ace28ba0cbf107452d67c2) to include that.

tarak77 commented 5 years ago

Great!, But in this file mm10 gap file: Galaxy12-[UCSC_Main_on_Mouse_gap(genome)].txt the centromere coordinates same for all chromosomes? why is that so?

Interesting, I don't get why using zero will work?

tanlongzhi commented 5 years ago

According to my understanding, all mouse centromeres are at the very beginning of chromosomes (called "telocentric"), and therefore should have similar start and end coordinates. I guess mm10 set them to be the same as an approximation. You can probably ask an expert about this.

Since each telomere-centromere is completely unmappable, it doesn't really matter where one sets the midpoint coordinate.

tarak77 commented 5 years ago

Ah I see, thanks!

tarak77 commented 5 years ago

@tanlongzhi , I was trying to implement the script, but get an error:

(py27) wg-dhcp174d191d007:dip-c tarakshisode$ paste mm10.cen.hom.name mm10.cen.hom.color ./test_cells/4CSE/M_4CSE-1/cell.cen.pos | python ./scripts/name_color_x_y_z_to_cif.py /dev/stdin > ./test_cells/4CSE/M_4CSE-1/cell.cen.n.cif
Traceback (most recent call last):
  File "./scripts/name_color_x_y_z_to_cif.py", line 42, in <module>
    b_factor = float(cen_tel_file_line_data[1]) # column 2: color -> b_factor
ValueError: could not convert string to float: chr1
tanlongzhi commented 5 years ago

The color values must be real numbers, not strings.

For mouse, you can remove the string prefix chr and replace the sex chromosomes X and Y with 20 and 21 by running:

awk 'BEGIN{FS="\t"}{print $1;print $1}' color/mm10.chr.cen | sed 's/chr//g; s/X/20/g; s/Y/21/g' > mm10.cen.hom.color

You can find examples of such human-mouse difference (hg19 versus mm10) in data treatment in the provided scripts scripts/con_to_matlab_human.sh versus scripts/con_to_matlab_mouse.sh.

tarak77 commented 5 years ago

Right, missed the chr part. Also is there a way to make the centromere sphere sizes bigger in cif file??

tanlongzhi commented 5 years ago

According to my understanding of PyMol, the sphere size is not controlled by the mmCIF file.

I typically format it by typing the following in PyMol, assuming the centromere file is called cen.cif:

set sphere_scale, 3, cen
tarak77 commented 5 years ago

Awesome!