Closed JFF1594032292 closed 3 years ago
Hi Jiang,
Each row in structure.txt is corresponding to the xyz coordinates of one bin, i.e. the first column is the x coordinate, second column is the y coordinate, and third column is the z coordinate. The shape of structure.txt file should actually be (2n)-by-3, where the first half n-by-3 is for maternal structure, and second half for paternal structure. The contact matrices are unnormalized yet, you can normalize them using ICE.
Thanks for your reply! Now I understand the file format. In my opinion, the "xyz coordinates of one bin" is the space coordinate in structure_3d.html, am I right? But these results don't contain the position of bins in the genome, I wonder if the order of bins in structure.txt are successively arranged in corresponding resolution?
Thanks a lot, Jiang
Yes, you're right, they're the space coordinates, and they're in the order of their genomic positions in corresponding resolution, for example, if the structure is chrX in 1Mb resolution, then the first row would be chrX(maternal):1-1,000,000 and second row be chrX(maternal):1,000,001-2,000,000, etc.
Thanks! It helps me a lot!
Hi, I have a question about the ICE normalization. Can we use the ICE for a region chrX_1000000 not the genome-wide? Does it influence the result? Thanks
Hi, Thanks for developing this powerful tool! I ran the example command
ashic -i examples/sample_data/GSM2863686_chrX_1000000.pickle -o output --model ASHIC-PM
and it produced results folder as described. However, there is no header in structure.txt, and I don't know the meaning of these three columns. The t_mm.txt, t_mp.txt, t_pp.txt are contact matrix, I think it is the normalized contact counts between each segments. But the counts of lines in these three files are not the same as the structure.txt, which is not accord with the description (shape: n n and shape: n 3). I was really confused with them.Thanks,
Jiang