paulsengroup / hictk

Blazing fast toolkit to work with .hic and .cool files
MIT License
22 stars 1 forks source link

Error creating All:All matrix #178

Closed Phlya closed 3 months ago

Phlya commented 3 months ago

Hi, I am trying out hictk to convert .mcool to .hic, and I am encountering an issue. After writing all by-chromosome pixels, it tries to create the All:All matrix and fails like this:

>hictk convert --threads 8 --tmpdir results/coolers_library_group         results/coolers_library_group/all.sacCer3.mapq_30.1000.mcool results/coolers_library_group/all.sacCer3.mapq_30.1000.hic

...
2024-05-22 13:55:03.595] [info]: writing pixels for All:All matrix...
FAILURE! hictk convert encountered the following error: an error occurred while writing file "results/coolers_library_group/all.sacCer3.mapq_30.1000.hic": an error occurred while writing the 
All:All matrix to file "results/coolers_library_group/all.sacCer3.mapq_30.1000.hic": position is greater than chromosome size: 4140417920 >= 1531933

This is a tiny .mcool with some yeast data that we use for testing in distiller, the file is attached (changed the extension to txt so github doesn't complain) all.sacCer3.mapq_30.1000.txt

Am I doing something wrong here?

robomics commented 3 months ago

Thanks for reporting this issue and providing an easy way to reproduce the bug!

Off the top of my head I can't think of a reason why that operation should fail. Will try to look into this by the end of the week.

robomics commented 3 months ago

Thanks again for reporting this issue.

The problem turned out to be having bin tables where all chromosomes have exactly one bin. The code used in hictk to estimate the cache size when opening .hic files had the incorrect assumption that the longest chromosome in a genome has at least two bins. Violating this assumption led to an integer overflow which resulted in the position is greater than chromosome size error.

Phlya commented 3 months ago

Thank you for looking into it so quickly!