moeyensj / thor

Tracklet-less Heliocentric Orbit Recovery
BSD 3-Clause "New" or "Revised" License
41 stars 14 forks source link

Uses LargeString to help avoid offset overflow errors #147

Closed akoumjian closed 10 months ago

akoumjian commented 10 months ago

Pyarrow's string array can only handle memory locations with an offset that uses 32 bits. This limits not just individual strings, but string arrays to 2GB in size. Since we frequently use datasets that can breach that limit, we move to LargeStringArrays. The additional memory overhead is modest in a table (string columns X 4 bits X num rows extra).