Closed lpzoaa closed 1 month ago
Yes, there is a limitation of contig length.
def update_clm_dict(clm_dict, ctg_name_pair, len_i, len_j, coord_i_0, coord_j_0):
clm_dict[ctg_name_pair].extend((
len_i - coord_i_0 + coord_j_0,
len_i - coord_i_0 + len_j - coord_j_0,
coord_i_0 + coord_j_0,
coord_i_0 + len_j - coord_j_0))
clm_dict
here is a Python array with a data type of signed integer, which ranges from -2,147,483,648 to 2,147,483,647. When calculating the four integers mentioned above, the maximum absolute value can be twice the length of the longest contig. Therefore, I suspect there is at least one contig in your assembly longer than 1.07 Gb, rather than just 500 Mb.
Although such a long contig is uncommon, if my suspicion is correct, I could develop a feature to dynamically set the data type of clm_dict
based on the maximum contig length. However, I am uncertain whether downstream tools like ALLHiC will encounter issues, so I need some time to test this. Alternatively, you could break these long contigs and record the breakpoints, then rejoin them after completing the scaffolding process.
Thank you for your reply. As you suspected, the longest contig reaches 1.2 Gb. I will follow your suggestion to split the contigs that exceed 1 Gb. Once again, thank you for developing such an efficient tool.
Thank you very much for developing such an efficient tool! However, I encountered an issue while using it. I noticed that when the length of the original contig exceeds 500Mb (2^29), the error shown in the attached image occurs.
OverflowError: signed integer is greater than maximum
Could you please confirm if this is a limitation of the software itself?