matrix converstion - Githubissues

nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing

Other

382 stars 183 forks source link

matrix converstion #410

Closed BenxiaHu closed 3 years ago

BenxiaHu commented 3 years ago

Hi, I have run HiC-Pro to obtained ICED normalized matrix. Now I want to convert the matrix generated by HiC-Pro to the following format:

Do you have any suggestions? Best,

nservant commented 3 years ago

Hi It doesn't seem to be very different from the .matrix file format generated by HiC-Pro. A simple awk/python scripting should do the job. Best

BenxiaHu commented 3 years ago

here is result from 10kb normalized .matrix:

So the first 2 columns are multiplied by 10kb, right? However, how to detemine which rows are from which chromosomes? Best,

nservant commented 3 years ago

ah sorry, I just understood. No. This file is a triplet sparese format file, with i, j, k i and j are the indices in the matrix. k is the count.

To make the correspondance between i, j and the genome coordinate, you have a bed file with the matrix. Usually, the bed file is only with the raw data, and normalized and raw data have the same coordinates.

BenxiaHu commented 3 years ago

Thanks. here is the bed file: the screenshot of matrix file:

the row number of bed file is different from that of matrix file. I am a little confused about how to define the genome coordinates for the matrix file based on the bed file generated by HiC-Pro.

nservant commented 3 years ago

In the BED file you have ; chr / start / end / BIN_ID

In the matrix file you have : BIN_ID / BIN_ID / counts

Is that better ?

BenxiaHu commented 3 years ago

got it. thanks.