Pyarrow's string array can only handle memory locations with an offset that uses 32 bits. This limits not just individual strings, but string arrays to 2GB in size. Since we frequently use datasets that can breach that limit, we move to LargeStringArrays. The additional memory overhead is modest in a table (string columns X 4 bits X num rows extra).
Pyarrow's string array can only handle memory locations with an offset that uses 32 bits. This limits not just individual strings, but string arrays to 2GB in size. Since we frequently use datasets that can breach that limit, we move to LargeStringArrays. The additional memory overhead is modest in a table (string columns X 4 bits X num rows extra).