zengxiaofei / HapHiC

HapHiC: a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data
https://www.nature.com/articles/s41477-024-01755-3
BSD 3-Clause "New" or "Revised" License
137 stars 10 forks source link

OverflowError: signed integer is greater than maximum #81

Open lstxmu opened 2 weeks ago

lstxmu commented 2 weeks ago

1cluster_run.log HI, i am assemblying a insect genome about 11Gb , and run into the error issue, can you help me fix it ? Thank you.

zengxiaofei commented 2 weeks ago

A similar issue: https://github.com/zengxiaofei/HapHiC/issues/73

lstxmu commented 2 weeks ago

what can i do if i don't want to split the contig above 1Gb , is there any other fix method?

zengxiaofei commented 2 weeks ago

If you choose to rejoin the split contigs after scaffolding, don't worry, it will not impact the contiguity of your contigs. However, if you just find the process of splitting and joining contigs boring or don't know how to modify the Python code of HapHiC, the answer is currently no. I'm busy these days and unable to make these modifications or conduct tests. Sorry!

lstxmu commented 2 weeks ago

ok. other question, if i split the contig above 1GB, wheather the hic data mapping workflow should be rerun ?

zengxiaofei commented 2 weeks ago

Yes.

zengxiaofei commented 2 weeks ago

I have added the enhancement label. Although a contig longer than 1.07 Gb is not common, I will try to fix this problem when I have the time.

zengxiaofei commented 2 weeks ago

Additionally, considering that you are scaffolding a large genome, I would recommend upgrading HapHiC to the latest version (1.0.6), as this version has better compatibility with Juicebox for large genomes.