tariks / peakachu

Genome-wide contact analysis using sklearn
MIT License
57 stars 9 forks source link

Error with cooler files from 4DN #21

Closed WangJiuming closed 1 year ago

WangJiuming commented 1 year ago

Hi, when Peakachu is trained on the .mcool data (source: 4DNFIXP4QG5B), the following error occurred.

In the meanwhile, the error seems to be specific to .mcool/.cool files since Peakachu works properly with the .hic file from the same experiment (source: 4DNFI1UEG1HD). May I ask if you would know how to fix this? Thanks in advance.

collecting from chr1
Traceback (most recent call last):
  File "/home/anaconda3/envs/peakachu/bin/peakachu", line 91, in <module>
    run()
  File "/home/anaconda3/envs/peakachu/bin/peakachu", line 87, in run
    args.func(args)
  File "/home/anaconda3/envs/peakachu/lib/python3.10/site-packages/peakachu/train_models.py", line 66, in main
    if X[b1, b2] > maxv:
  File "/home/anaconda3/envs/peakachu/lib/python3.10/site-packages/scipy/sparse/_index.py", line 47, in __getitem__
    row, col = self._validate_indices(key)
  File "/home/anaconda3/envs/peakachu/lib/python3.10/site-packages/scipy/sparse/_index.py", line 164, in _validate_indices
    raise IndexError('column index (%d) out of range' % col)
IndexError: column index (24910) out of range
XiaoTaoWang commented 1 year ago

Hi, sorry for the late response. I think the error occurred due to a discrepancy in genomic coordinates between the contact matrix and your training data. It seems that the coordinates in your bedpe file were in hg19, while the coordinates of your mcool file were in hg38.

WangJiuming commented 1 year ago

Thanks for your help! Using a contact matrix with a correct reference genome fixed the problem. I will close this issue now.