Open h-vetinari opened 2 years ago
This seems like a practical way for code to be explicit about exactly where it wants large conversions to be allowed.
I suggest a name of max_digits=
for the new keyword only arg rather than the less obvious what it is for limit=
. Even max_digits= could be confusing if someone were using it on an int(thing, base=16, max_digits=300)
call. But nobody should ever write code attempting that, we'd document it in the help text and docs as only applying the given limit to non-power-of-2 base conversions.
I agree this is probably a good API to add. While I understand the motivation for the mitigation, I agree that global state is difficult to balance the various needs against.
Even max_digits= could be confusing if someone were using it on an int(thing, base=16, max_digits=300) call.
Could/should we raise a ValueError
if max_digits
is given when base
is not a factor of 2?
Thanks for the response!
But nobody should ever write code attempting that, we'd document it in the help text and docs as only applying the given limit to non-power-of-2 base conversions.
TBH, I don't see why the keyword shouldn't also work for other bases. At least I can imagine someone wanting to consume a hexadecimal string into an int, but not permit unexpectedly long inputs.
Why deny them the obvious choice that works for non-base-2? At least, I don't see the harm in exposing that API, and I think consistency would actually be more valuable.
PS. No strong feelings about the name of the kwarg, as long as the functionality exists.
TBH, I don't see why the keyword shouldn't also work for other bases. At least I can imagine someone wanting to consume a hexadecimal string into an int, but not permit unexpectedly long inputs.
Why deny them the obvious choice that works for non-base-2? At least, I don't see the harm in exposing that API, and I think consistency would actually be more valuable.
Good point, this becomes a feature for enabling an explicit limit in that base^2 case.
Please test the performance effect of adding support of a new keyword parameter if the argument is not passed. I.e. on int(x)
and int(x, base=y)
.
Please test the performance effect
For a normal feature, this would be the bar of course. But this is not a normal feature, as it's concerned with unbreaking previously valid code without depending on an unworkable-at-scale workaround.
If a potential regression is not egregiously bad (which I don't think is likely, as int()
already has keywords), solving this should (IMO) clearly take priority.
But if people think the API is a good thing to add, I can maybe try my hand at a PR; then it would also be easier to seriously measure performance (after making sure I didn't miss obvious optimizations).
If there was a meaningful performance regression, you could still solve the problem without impacting "normal" int
conversions by making an alternate constructor (classmethod
) for int
that accepted the argument and made the intent more clear (like int.from_bytes
). Arguably a good idea regardless, as keeping the baseline constructors simple and understandable is a reasonable goal to avoid immediately confronting new Python developers with details of a fairly minor security protection. Blithely accepting million character strings in web applications in a context that blindly converts them to int
frankly seems more like a problem for imposing reasonable sanity limits on the amount of data the web interface will accept before it ever reaches the int
constructor (if you let me feed multiple MB of data to your API, I'm already likely to be able to DDoS most smaller scale servers simply by overfeeding them data, completely ignoring the CPU costs of parsing it). And while I wouldn't expect most users to think to use the alternate constructor, the major Python web frameworks should have the expertise to start using it automatically in appropriate contexts.
FWIW I consider this a tracking issue for either a new classmethod constructor or a new keyword argument.
We'll play around and decide which we find best. It's a good point that keyword argument soup is worth avoiding (as the subprocess.Popen API demonstrates). We want readers of code using the API to understand what it does and why without having to look it up.
brainstorming: int.from_unbounded_string
, int.to_unbounded_string
(those feel a bit long, but do attempt to get the point across)
Enhancement
With the mitigations for CVE-2020-10735 in place, there's tension between:
Those are both vying for the one setting (
PYTHONINTMAXSTRDIGITS
) that controls this, which is unfortunately global state.Since several such libraries are often loaded in the same runtime, this makes it very tricky to balance.
The idea is simply to add a
limit=None
keyword toint()
andstr()
, which defaults toPYTHONINTMAXSTRDIGITS
ifNone
, but can be set to any other value. Setting it to0
(for example, or -1) would make it unlimited.Pitch
This would not solve all the fallout right away, but it would provide a sane path forward, where the default can be safe, and callsites can explicitly opt into saying "I want this call to have a different limit".
In particular, it would not require global state to balance the divergent needs of different parts of the ecosystem anymore.
In case a spectacular new conversion algorithm comes along, the default could eventually be set to infinite again, but the API wouldn't stand out as useless/harmful, but still serve a clear purpose, and so this proposal is independent of future string algorithms or default limits.
Previous discussion
https://discuss.python.org/t/int-str-conversions-broken-in-latest-python-bugfix-releases/18889/ and many more places linked from there