python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.4k forks source link

int.to_bytes(-1, ...) should automatically choose required count of bytes #71824

Open 01e27b45-90f2-4c74-9e5e-7e7e54c3d78e opened 8 years ago

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 8 years ago
BPO 27637
Nosy @mdickinson, @socketpair, @vadmium, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'library'] title = 'int.to_bytes(-1, ...) should automatically choose required count of bytes' updated_at = user = 'https://github.com/socketpair' ``` bugs.python.org fields: ```python activity = actor = 'lorenz_' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'socketpair' dependencies = [] files = [] hgrepos = [] issue_num = 27637 keywords = [] message_count = 7.0 messages = ['271488', '271489', '271498', '271502', '271507', '271542', '403194'] nosy_count = 5.0 nosy_names = ['mark.dickinson', 'socketpair', 'martin.panter', 'serhiy.storchaka', 'lorenz_'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue27637' versions = ['Python 3.6'] ```

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 8 years ago

It will be nice if int.to_bytes be able to automatically choose number of bytes to serialize. If so, I could write serialisation code:

def serialize(value: int, signed=True) -> bytes:
    x = value.to_bytes(-1, 'big', signed=signed)
    l = value.to_bytes(4, 'big', signed=False)
    return l + x

assert len(serialize(0)) == 4 + 0 # see bpo-27623 assert len(serialize(120)) == 4 + 1 assert len(serialize(130)) == 4 + 2 assert len(serialize(130), False) == 4 + 1

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 8 years ago

Oops.

def serialize(value: int, signed=True) -> bytes:
    x = value.to_bytes(-1, 'big', signed=signed)
    l = len(x).to_bytes(4, 'big', signed=False)
    return l + x

assert len(serialize(0)) == 4 + 0 # see bpo-27623 assert len(serialize(120)) == 4 + 1 assert len(serialize(130)) == 4 + 2 assert len(serialize(130), False) == 4 + 1

vadmium commented 8 years ago

I don’t like special values. A length of minus one makes no sense, so should trigger an exception, not some unexpected behaviour. A different data type like None would be a bit better.

But I’m not sure this would be widely used. If you really need it you could calculate the number of bytes needed via value.bit_length().

serhiy-storchaka commented 8 years ago

This is rarely needed, mainly in general serializers like pickle. The code for determining the minimal number of bytes is not trivial, but it depends on the serializer. If you always serialize unsigned values and saves the sign separately, or use one's complement represenatation, or if the serializer supports only fixed set of integer sizes, the code is absolutely different.

I don't think that we need this feature in the stdlib.

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 8 years ago

https://github.com/pyca/cryptography/issues/3064

mdickinson commented 8 years ago

[Martin]

I don’t like special values.

Agreed. If we wanted to add this, the obvious API would be to simply make the size optional (which would force passing the endianness by name or explicitly passing a default value of None, but that doesn't seem like a big deal to me).

I'm -0 on the feature itself. On the plus side, the fact that it's not completely trivial to compute the size with errors is an argument for including that calculation within the Python code. I'd suggest formulas of:

(x.bit_length() + 7) // 8

for the unsigned case, and

(~x if x \< 0 else x).bit_length() // 8 + 1

for the signed case, these giving the minimal number of bytes necessary for encoding x in each case.

7d18a990-213b-49e9-a8ad-eff7ffda87c5 commented 3 years ago

I would like to express my support for making length=None to automatically use the minimal possible length. It's true that this will rarely be needed in production-grade serialization code, but this functionality is worth its weight in gold for quickly written proof-of-concept code or when using Python as a "pocket calculator" in an interactive shell.

I'm sure I've personally typed the expression (n.bit_length()+7)//8 approximately a million times while quickly trying something. It'd be nice if Python could just do this simple computation for me instead. The code changes required are minimal and there shouldn't be any performance impact.

In fact, in my opinion this should even be the default behaviour, but 3.11 just made length=1 the default (see bpo-45155) and changing this now would cause an (albeit very mild) API incompatibility.