python / cpython

The Python programming language
https://www.python.org
Other
63.11k stars 30.22k forks source link

Support formatting floats in hexadecimal (and binary?) notation #113804

Closed skirpichev closed 3 months ago

skirpichev commented 9 months ago

Feature or enhancement

Proposal:

Currently there is the float.hex() method to get a representation of a floating-point number as a hexadecimal string.

We can also (instead?) add same formating option for the str.format(), f-strings and old-style ('%' operator) string formatting. The C printf() has a dedicated type fields for this: 'a' and 'A'. (Printed values have mandatory 0x or OX prefix and alternate form is used to enforce decimal point.)

While it's possible to add such formatting types for the str.format() as well, for the old-style string formatting these letters will conflict (see this) with the ascii() conversion type. Lets instead reuse 'x' and 'X' type fields for this, currently supported only for integers.

This proposal introduce an incompatibility with C-style printf(), but some other languages also have such one. Notable example is the Go's fmt package.

New interface for formatting of floats will be more flexible and compact than the former float.hex():

>>> f'{-0.1:#x}'
'-0x1.999999999999ap-4'
>>> (-0.1).hex()
'-0x1.999999999999ap-4'
>>> f'{3.14159:+#X}'
'+0X1.921F9F01B866EP+1'
>>> f'{3.14159:.3x}'
'1.922p+1'

We may also consider to support printing floats as binary strings. Either in the base-2 scientific notation, or like the Go's strconv.FormatFloat. The gmpy2 (and MPFR library) uses "b"/"B" format types for this:

>>> format(gmpy2.mpfr(3.14), "b")
'1.1001000111101011100001010001111010111000010100011111p+1'

Related issue: https://github.com/python/cpython/issues/114667

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/41848

Linked PRs

skirpichev commented 8 months ago

PR is ready for review: https://github.com/python/cpython/pull/113805

serhiy-storchaka commented 4 months ago

Is there enough core developer support for this feature? It looks like too specialized feature, float.hex() should be enough. This can make the code more error prone, -- if the value happens to be a float instead of an int, you can get an unexpected result instead of an error.

ericvsmith commented 4 months ago

I agree with @mdickinson on d.p.o. that this should have a PEP.

skirpichev commented 4 months ago

Is there enough core developer support for this feature?

@serhiy-storchaka, I'm not sure what you mean. This is a regular pr, that solves a feature request.

If this will not have enough support among core developers, this just will not be merged.

It looks like too specialized feature, float.hex() should be enough.

This variant seems less flexible, i.e. you can print only the whole value. I think this opinion could be backed by interfaces in other languages (usually printf-like).

if the value happens to be a float instead of an int, you can get an unexpected result instead of an error.

Perhaps, this is an argument against using "x"/"X" format types. Original proposal was about using printf-like (or as in MPFR & gmpy2) "a"/"A" format types with same semantics. That was mentioned in the pr.

I agree with @mdickinson on d.p.o. that this should have a PEP.

If I correctly interpreted that opinion, it was about support for hexadecimal floating point literals. There is a separate issue: https://github.com/python/cpython/issues/114667 with a pr and a PEP draft.

Regarding this, here is a quote: "For the formatting addition, I think a careful and complete description of the proposed new functionality in a GitHub issue would be enough, though again there are many details to be determined."

serhiy-storchaka commented 4 months ago

I see that @vstinner and @mdickinson are interested enough to make a review, so there is chance to be merged. I am -0 for this feature for the reasons stated above, so I would not merge it myself.

Additional parameters can be added for float.hex() to make it more flexible. There is already a precedence in bytes.hex().

mdickinson commented 4 months ago

I'm afraid I'm -1 on this with the proposed implementation in https://github.com/python/cpython/pull/113805

vstinner commented 4 months ago
  • The implementation changes the code for float.hex instead of simply adding code for the new feature, meaning that the float.hex code has to be re-evaluated for correctness. (...)
  • The addition of a precision specifier introduces significant new complications (...)
  • The inconsistency in handling of trailing zeros between float.hex and the x format (...)

Here is a counter-proposition: PR gh-119945 adds 'x' format to float.format() which is simply implemented by calling float.hex(). Nothing more, nothing less.

mdickinson commented 4 months ago

@vstinner Indeed that's much closer to what I was thinking of when I wrote that d.p.o. comment. I'll take a look.

skirpichev commented 4 months ago

I'm not aware of any good use-cases for it.

I don't believe that so many languages (probably any that support formatted output in hexadecimal notation) implement useless feature.

Rounding to single precision could potentially be a use case, but it's not supported, since the number in the precision specifier refers to the number of hexadecimal digits after the initial 1.

But that's valid only in case of 1-normalization (i.e. like binary representation, but with "compressed" mantissa after the dot). C standard doesn't require this for printing. Not sure about libc implementations, but the MPFR certainly doesn't use 1-normalization in all cases.

mdickinson commented 4 months ago

I don't believe that so many languages (probably any that support formatted output in hexadecimal notation) implement useless feature.

I think we need a stronger justification than what amounts to "C does it, and they must have had a reason". I don't know what reason the C standards committee had for allowing precision in %a and %A formats, and the C99 rationale document isn't helpful here. I can speculate, but without knowing the real reason, it's pretty much impossible to determine whether that reason still applies a quarter of a century later, to a completely different language, with different assumptions on the available floating-point formats.

The gcc manual says:

The ‘%a’ and ‘%A’ conversions are meant for representing floating-point numbers exactly in textual form so that they can be exchanged as texts between different programs and/or machines.

And that's the primary reason that float.hex exists in Python - to give a reasonably concise exact representation of the value of a floating-point number.

tim-one commented 4 months ago

I expect this is no deeper than a cabal in the committee objecting "but all other float formats allow specifying the number of characters of precision - it would be INCONSISTENT if the hex float formats didn't also". "But it's useless!" "What part of INCONSISTENT don't you grasp?"

I doubt anyone has an actual use for this. The output isn't meant to be human-friendly - it's meant to sidestep the expense of - and historical errors in - float <-> decimal_string conversions.

"Design by committee" frequently makes dubious compromises just to shut someone up :wink:.

For the record, I'm happy with what Python does now. I use it, but not frequently enough to mind the bother. Most recently (in _pylong.py on the main branch):

# log of 10 to base 256 with best-possible 53-bit precision. Obtained
# via:
#    from mpmath import mp
#    mp.prec = 1000
#    print(float(mp.log(10, 256)).hex())
_LOG_10_BASE_256 = float.fromhex('0x1.a934f0979a371p-2') # about 0.415

People don't need to be float-savvy to read this.. If they're confused, it's very easy to get detailed explanations by searching for "float.fromhex". Searching for "0x1.a934f0979a371p-2" instead wouldn't help them at all.

serhiy-storchaka commented 4 months ago

Why you cannot simply use float.hex()?

tim-one commented 4 months ago

Why you cannot simply use float.hex()?

Can you be more specific? I don't know what you're referring to. I did use float.hex() in the commented-out mpmath code to get the hex string to begin with. But that method is useless to convert the string back to a float.

serhiy-storchaka commented 4 months ago

Sorry, I meant not you, @tim-one, but the OP, @skirpichev.

ericvsmith commented 4 months ago
  • Precision and width are ignored.

Rather than ignoring them, I think it would be better to raise an error if they're supplied.

tim-one commented 4 months ago

Precision and width are ignored.

Rather than ignoring them, I think it would be better to raise an error if they're supplied.

Yes, errors should never pass silently :wink:.

But width shouldn't be ignored. The width specifier, and left- or right-justification flags ("<" or ">" in the case of floats), are generally applicable to all format operations, and really have little to do with the data type. They're to help people arrange output into coherently readable columns, and, as such, are really about padding the format operation's "real" output.

skirpichev commented 4 months ago

I think we need a stronger justification than what amounts to "C does it, and they must have had a reason".

But the point was - this is not just C, and I doubt that an explanation "this design really borrowed from the C committee" does work: neither Go, nor MPFR seems to be blindly following C standard.

Why you cannot simply use float.hex()?

Sometimes I use precision setting, though more for "b" MPFR's format type. I would be more interested in support for hexadecimal literals (and in using this beyond CPython floats, in MP math).

BTW, why not treat hexadecimal literals just as binary with "compressed" mantissa (after the dot)? Then precision in "x" (or "a"?) format could be counted in bits instead.

But width shouldn't be ignored.

+1

tim-one commented 4 months ago

Go, nor MPFR seems to be blindly following C standard.

No, Go is blindly following this section from the 2008 revision of the 754 standard:

IEEE 754-2008

5.12.3 External hexadecimal-significand character sequences representing finite numbers.

Which in turn blindly followed a C standard.

This isn't a required part of 754, just a "should" (not a "shall"). And Python already satisfies it, kind of, via .hex()/.fromhex()

About "kind of": things like inexact, overflow, and underflow during conversions are supposed to interact in the obvious ways with 754's trap & signaling gimmicks. But Python doesn't support those.

Note that 754 says nothing about the form in which facilities have to be supplied. Whether via built-in syntax or long-winded chains of function calls are all the same to the standard. float methods are fine.

We're desperately in need of real "use cases" here. Mark probably has more significant numeric experience than any other core Python dev, and I'm not exactly a slouch either :wink:. Yet neither of us can think of any real use for this worth the bother. I'm very glad we have .hex()/.fromhex(), but they're sufficient.

In mpmath, sure, I've written utility routines to, e.g., print out as many bits of floats as I like, -and break them into segments, and stick up-arrows under adjacent lines where the first bits differ, and stick a pipe ("|") character after the last bit that will be retained by rounding; etc. But almost all of that is driven by trying to work with variable-precision floats.

CPython just has 53-bit doubles. 1 bit + 13 hex digits - one size fits all when there is only one size.

why not treat hexadecimal literals just as binary with "compressed" mantissa (after the dot)? Then precision in "x" (or "a"?) format could be counted in bits instead.

I still see almost no sane use for rounding back mantissa bits at all. One dim possibility: if someone wants the best possible conversion to IEEE single precision, then they want to round back from 53 to 24 bits.

So then they'd get 1 bit + 6 hex digits, where the last hex digit is always even.

But on any platform that supports both types, it's more straightforward to cast to single, then back to double, and apply .hex() to that. Every new implementation of rounding is a potential source of subtle new bugs.

skirpichev commented 4 months ago

No, Go is blindly following this section from the 2008 revision of the 754 standard:

I meant printing support (Go's fmt package) or mpfr_*print*() stuff.

But almost all of that is driven by trying to work with variable-precision floats.

Mpmath has several floating point contexts, one using python floats. Wouldn't you prefer a common interface for formatted output?

cast to single

But Python has no such type.

tim-one commented 4 months ago

Mpmath has several floating point contexts, one using python floats. Wouldn't you prefer a common interface for formatted output?

It's not a real "use case" for me. I rarely want this at all. In mpmath, I want far, far, far more than just mechanical float <-> hex conversion. In fact, in that context I never want hex - I want binary.

But Python has no such type.

Leaving aside that ctypes exposes essentially all platform C types, from the very earliest days you could "cast to float" via the struct module:

>>> import struct
>>> x = 0.1
>>> x.hex()
'0x1.999999999999ap-4'
>>> struct.unpack('f', struct.pack('f', x))[0].hex()
'0x1.99999a0000000p-4' # voila! rounded to IEEE single
vstinner commented 4 months ago

But width shouldn't be ignored.

I suggest to leave comments directly on the PR gh-119945. I updated it to support width and raise an exception if precision is used.

skirpichev commented 4 months ago

I want binary.

FYI: that was part of the original proposal (MPFR variant).

vstinner commented 3 months ago

There are now 2 implementations:

@mdickinson @tim-one @serhiy-storchaka: What do you prefer? Do nothing and close the issue? Pick one implementation?

skirpichev commented 3 months ago

which has multiple feature and comes with his implementation to convert a float to hexadecimal

JFR, 1) I don't sure if it how has features much beyond your PR (mostly precision support); 2) the new helper also reuses hex.float() code: the difference only in support for rounding (~20 lines).

serhiy-storchaka commented 3 months ago

Sorry, I am not interesting in this feature. float.hex() is enough to me. I see a minor drawback, so I am -0.

vstinner commented 3 months ago

Since no core dev (apart me :-)) seems to want to support this change, I suggest to close this issue and the two related PRs.

mdickinson commented 3 months ago

Yep, let's drop it.

vstinner commented 3 months ago

It was decided to not pursue this approach, since float.hex() exists and it's enough for most usages, and we don't want to make str.format() more complicated.

Thanks @skirpichev anyway for your interesting work! Maybe you can reuse it somewhere else later ;-)

I close the issue.

skirpichev commented 3 months ago

Thanks to all for review.

Maybe you can reuse it somewhere else later ;-)

Yes, I hope so:) Adding new-style string formatting support is one goal for the next release of mpmath. It's unfortunate, that this will be working differently for different contexts (i.e. mpmath.mp.mpf() type could be formatted with "a" type, while mpmath.fp.mpf() - i.e. python float - can't).