Using the previous implementation of sys.maxsize has a drawback on Windows 64bit LLP64 data model that even using 64bit wide pointers, they keep long as a 32bit integer to preserve type compatibility (see more in https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models)
So, to have a reliable max long size on the target system you can use ctypes.c_ulong(-1) that cycles back to ulong max value divided by 2 to get the size of long on the target platform.
Regarding memory impact, the long value for the limit is already allocated to store the max limit and it's used only to block sizes higher than that. The CSV itself is allocated on demand.
This pull request touches issue #306 and fixes the issue for all platforms without falling back to hardcoded value.
The overflow was happening due to the fact of python CSV implementation uses a
long
to store max limit (https://github.com/python/cpython/blob/c88239f864a27f673c0f0a9e62d2488563f9d081/Modules/_csv.c#L21)Using the previous implementation of
sys.maxsize
has a drawback on Windows 64bit LLP64 data model that even using 64bit wide pointers, they keeplong
as a 32bit integer to preserve type compatibility (see more in https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models)So, to have a reliable max
long
size on the target system you can usectypes.c_ulong(-1)
that cycles back toulong
max value divided by 2 to get the size oflong
on the target platform.Regarding memory impact, the
long
value for the limit is already allocated to store the max limit and it's used only to block sizes higher than that. The CSV itself is allocated on demand.I ran the profiler on a simple load of a big file from brasil.io https://brasil.io/dataset/genero-nomes/nomes/ and I got the same values after the csv import operation:
versus