Pickler's put_long method currently falls back on the text-based INT encoding if the long value is too large to be represented as a 4-byte signed integer.
Instead, I'm wondering whether it should use the LONG1 encoding and write it as an 8-byte signed integer. Since this method's parameter is a long I think all of the values should fit in a LONG1. My understanding is that LONG1 should be more time- and space-efficient for these values. Pyrolite already uses LONG1 encoding when writing BigIntegers.
If I use Pyrolite to do pickler.dumps(9223372036854775807L) (which is Long.MAX_VALUE), pickletools disassembles the result as:
0: \x80 PROTO 2
2: I INT 9223372036854775807
23: . STOP
highest protocol among opcodes = 2
This matches Python 2.7's behavior.
In contrast, Python 3.7 pickles this value using LONG1 (which requires nearly half the space):
>>> pickletools.dis(pickle.dumps(9223372036854775807, protocol=2))
0: \x80 PROTO 2
2: \x8a LONG1 9223372036854775807
12: . STOP
highest protocol among opcodes = 2
Pickler's
put_long
method currently falls back on the text-basedINT
encoding if the long value is too large to be represented as a 4-byte signed integer.Instead, I'm wondering whether it should use the
LONG1
encoding and write it as an 8-byte signed integer. Since this method's parameter is along
I think all of the values should fit in aLONG1
. My understanding is thatLONG1
should be more time- and space-efficient for these values. Pyrolite already usesLONG1
encoding when writing BigIntegers.If I use Pyrolite to do
pickler.dumps(9223372036854775807L)
(which is Long.MAX_VALUE), pickletools disassembles the result as:This matches Python 2.7's behavior.
In contrast, Python 3.7 pickles this value using
LONG1
(which requires nearly half the space):