The results of example command

wmalab / ASHIC

Allele-specific modeling of diploid Hi-C data

MIT License

9 stars 2 forks source link

The results of example command #4

Closed JFF1594032292 closed 3 years ago

JFF1594032292 commented 3 years ago

Hi, Thanks for developing this powerful tool! I ran the example command ashic -i examples/sample_data/GSM2863686_chrX_1000000.pickle -o output --model ASHIC-PM and it produced results folder as described. However, there is no header in structure.txt, and I don't know the meaning of these three columns. The t_mm.txt, t_mp.txt, t_pp.txt are contact matrix, I think it is the normalized contact counts between each segments. But the counts of lines in these three files are not the same as the structure.txt, which is not accord with the description (shape: n n and shape: n 3). I was really confused with them.

Thanks,

Jiang

tye42 commented 3 years ago

Hi Jiang,

Each row in structure.txt is corresponding to the xyz coordinates of one bin, i.e. the first column is the x coordinate, second column is the y coordinate, and third column is the z coordinate. The shape of structure.txt file should actually be (2n)-by-3, where the first half n-by-3 is for maternal structure, and second half for paternal structure. The contact matrices are unnormalized yet, you can normalize them using ICE.

JFF1594032292 commented 3 years ago

Thanks for your reply! Now I understand the file format. In my opinion, the "xyz coordinates of one bin" is the space coordinate in structure_3d.html, am I right? But these results don't contain the position of bins in the genome, I wonder if the order of bins in structure.txt are successively arranged in corresponding resolution?

Thanks a lot, Jiang

tye42 commented 3 years ago

Yes, you're right, they're the space coordinates, and they're in the order of their genomic positions in corresponding resolution, for example, if the structure is chrX in 1Mb resolution, then the first row would be chrX(maternal):1-1,000,000 and second row be chrX(maternal):1,000,001-2,000,000, etc.

JFF1594032292 commented 3 years ago

Thanks! It helps me a lot!

ZHIDIHUAYUAN commented 3 years ago

Hi, I have a question about the ICE normalization. Can we use the ICE for a region chrX_1000000 not the genome-wide? Does it influence the result? Thanks