tlwg / libthai

GNU Lesser General Public License v2.1
70 stars 19 forks source link

th_brk / th_brk_find_breaks are limited to ≤ 2Gi characters #19

Open marcmutz opened 2 years ago

marcmutz commented 2 years ago

The th_brk() and th_brk_find_breaks() functions take the input string size as size_t, but return the results in an int array, effectively limiting possible results to the first INT_MAX characters. Users of this function must therefore ensure that either the input never exceeds 2Gi characters, or find a way to loop over the function in chunks of no more than INT_MAX characters, which is far from obvious whether it's possible or how it should be done (at least to this developer, who doesn't have a clue about libthai but needs to fix 64-bit issues).

Suggestion: widen the result array to size_t, document how to loop for existing versions.

marcmutz commented 1 year ago

Ping? Any suggestion for a safe chunking algorithm?