Open 01e27b45-90f2-4c74-9e5e-7e7e54c3d78e opened 8 years ago
It will be nice if int.to_bytes
be able to automatically choose number of bytes to serialize. If so, I could write serialisation code:
def serialize(value: int, signed=True) -> bytes:
x = value.to_bytes(-1, 'big', signed=signed)
l = value.to_bytes(4, 'big', signed=False)
return l + x
assert len(serialize(0)) == 4 + 0 # see bpo-27623 assert len(serialize(120)) == 4 + 1 assert len(serialize(130)) == 4 + 2 assert len(serialize(130), False) == 4 + 1
Oops.
def serialize(value: int, signed=True) -> bytes:
x = value.to_bytes(-1, 'big', signed=signed)
l = len(x).to_bytes(4, 'big', signed=False)
return l + x
assert len(serialize(0)) == 4 + 0 # see bpo-27623 assert len(serialize(120)) == 4 + 1 assert len(serialize(130)) == 4 + 2 assert len(serialize(130), False) == 4 + 1
I don’t like special values. A length of minus one makes no sense, so should trigger an exception, not some unexpected behaviour. A different data type like None would be a bit better.
But I’m not sure this would be widely used. If you really need it you could calculate the number of bytes needed via value.bit_length().
This is rarely needed, mainly in general serializers like pickle. The code for determining the minimal number of bytes is not trivial, but it depends on the serializer. If you always serialize unsigned values and saves the sign separately, or use one's complement represenatation, or if the serializer supports only fixed set of integer sizes, the code is absolutely different.
I don't think that we need this feature in the stdlib.
[Martin]
I don’t like special values.
Agreed. If we wanted to add this, the obvious API would be to simply make the size optional (which would force passing the endianness by name or explicitly passing a default value of None
, but that doesn't seem like a big deal to me).
I'm -0 on the feature itself. On the plus side, the fact that it's not completely trivial to compute the size with errors is an argument for including that calculation within the Python code. I'd suggest formulas of:
(x.bit_length() + 7) // 8
for the unsigned case, and
(~x if x \< 0 else x).bit_length() // 8 + 1
for the signed case, these giving the minimal number of bytes necessary for encoding x in each case.
I would like to express my support for making length=None to automatically use the minimal possible length. It's true that this will rarely be needed in production-grade serialization code, but this functionality is worth its weight in gold for quickly written proof-of-concept code or when using Python as a "pocket calculator" in an interactive shell.
I'm sure I've personally typed the expression (n.bit_length()+7)//8 approximately a million times while quickly trying something. It'd be nice if Python could just do this simple computation for me instead. The code changes required are minimal and there shouldn't be any performance impact.
In fact, in my opinion this should even be the default behaviour, but 3.11 just made length=1 the default (see bpo-45155) and changing this now would cause an (albeit very mild) API incompatibility.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-feature', 'library']
title = 'int.to_bytes(-1, ...) should automatically choose required count of bytes'
updated_at =
user = 'https://github.com/socketpair'
```
bugs.python.org fields:
```python
activity =
actor = 'lorenz_'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'socketpair'
dependencies = []
files = []
hgrepos = []
issue_num = 27637
keywords = []
message_count = 7.0
messages = ['271488', '271489', '271498', '271502', '271507', '271542', '403194']
nosy_count = 5.0
nosy_names = ['mark.dickinson', 'socketpair', 'martin.panter', 'serhiy.storchaka', 'lorenz_']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue27637'
versions = ['Python 3.6']
```