Closed tarak77 closed 5 years ago
Hey @tarak77,
Yes. Below I wrote some rough code for you (using human as an example); not tested though.
Extract centromere coordinates from the 3rd column of the provided file color/hg19.chr.cen
, and convert it into a .leg
file with two lines per chromosome (for the two haplotypes):
awk 'BEGIN{FS="\t"}{print $1","$3",0";print $1","$3",1"}' color/hg19.chr.cen > hg19.cen.hom.leg
Find the 3D positions of all centromeres from a .3dg
file with dip-c pos -l
:
dip-c pos -l hg19.cen.hom.leg cell.3dg > cell.cen.pos
Give each centromere a name (string) and a color (real value):
awk 'BEGIN{FS="\t"}{print $1"(pat)";print $1"(mat)"}' color/hg19.chr.cen > hg19.cen.hom.name
awk 'BEGIN{FS="\t"}{print $1;print $1}' color/hg19.chr.cen | sed 's/X/23/g; s/Y/24/g' > hg19.cen.hom.color
Generate an mmCIF file with the provided script scripts/name_color_x_y_z_to_cif.py
:
paste hg19.cen.hom.name hg19.cen.hom.color cell.cen.pos | python scripts/name_color_x_y_z_to_cif.py /dev/stdin > cell.cen.n.cif
The final .cif
file can be overlaid in PyMol onto your existing .cif
file of the cell. You can then format the appearance of the centromere .cif
file to highlight it.
Awesome! Quick question:
I think I used UCSC Genome Browser's Table Browser, and selected something like "assembly gaps". In the gap file, some lines correspond to centromeres, with start and end coordinates. I simply picked the midpoints.
If that doesn't work for you, alternatively, for mouse you can probably just use zero as all centromere coordinates.
Oh I didn't realized that I forgot to upload my mm10 centromere files, used in the publication. I just updated this repo (cb19e6ad1c4e7a18c3ace28ba0cbf107452d67c2) to include that.
Great!, But in this file mm10 gap file: Galaxy12-[UCSC_Main_on_Mouse_gap(genome)].txt the centromere coordinates same for all chromosomes? why is that so?
Interesting, I don't get why using zero will work?
According to my understanding, all mouse centromeres are at the very beginning of chromosomes (called "telocentric"), and therefore should have similar start and end coordinates. I guess mm10 set them to be the same as an approximation. You can probably ask an expert about this.
Since each telomere-centromere is completely unmappable, it doesn't really matter where one sets the midpoint coordinate.
Ah I see, thanks!
@tanlongzhi , I was trying to implement the script, but get an error:
(py27) wg-dhcp174d191d007:dip-c tarakshisode$ paste mm10.cen.hom.name mm10.cen.hom.color ./test_cells/4CSE/M_4CSE-1/cell.cen.pos | python ./scripts/name_color_x_y_z_to_cif.py /dev/stdin > ./test_cells/4CSE/M_4CSE-1/cell.cen.n.cif
Traceback (most recent call last):
File "./scripts/name_color_x_y_z_to_cif.py", line 42, in <module>
b_factor = float(cen_tel_file_line_data[1]) # column 2: color -> b_factor
ValueError: could not convert string to float: chr1
The color values must be real numbers, not strings.
For mouse, you can remove the string prefix chr
and replace the sex chromosomes X
and Y
with 20
and 21
by running:
awk 'BEGIN{FS="\t"}{print $1;print $1}' color/mm10.chr.cen | sed 's/chr//g; s/X/20/g; s/Y/21/g' > mm10.cen.hom.color
You can find examples of such human-mouse difference (hg19 versus mm10) in data treatment in the provided scripts scripts/con_to_matlab_human.sh
versus scripts/con_to_matlab_mouse.sh
.
Right, missed the chr part. Also is there a way to make the centromere sphere sizes bigger in cif file??
According to my understanding of PyMol, the sphere size is not controlled by the mmCIF file.
I typically format it by typing the following in PyMol, assuming the centromere file is called cen.cif
:
set sphere_scale, 3, cen
Awesome!
Hey @tanlongzhi, Is there a way to highlight the centromere positions in the movie of the 3D models for human/mouse? I ask that because it may help us determine the cell cycle phase.