BebeSparkelSparkel commented 9 months ago

I have created the following PRs but they seem to be intermingled and perhaps a larger pull would be better

632

633

634

I propose the following

combine the logic of formatFloat and formatDouble into formatFloating :: FloatFormat -> a -> Builder since they have the same logic and use classes to get their specific functions based on the floating type. Continuing exporting the formatFloat and formatDouble interface functions for compatibility.
Replace the constructor of FloatFormat to the constructors of FormatMode and remove the type FormatMode because the precision (Maybe Int) is not used and can be included as a parameter of the constructors
FScientific should have a Char for specifying a lower or upper E
FStandard should have a Maybe Int for the precision
FGeneric should have a Char for specifying a lower or upper E, and two Ints for the inclusive exponent range for printing the standard notation
Allow custom strings for special values +-Infinity +-0 NaN with data SpecialStrings = {..}
remove specialStr and use toCharsNonNumbersAndZero instead because they duplicate logic and it is faster to evaluate
I am unsure if this is optimized away but the case statement that selects the format should not be executed for every floating number to be printed. If not, the \f -> ... should be defined after the case statement and not before.

None of these changes should cause changes to the existing interface.

clyring commented 9 months ago

(1) I'm also not very happy with the amount of un-shared logic between the Float and Double-formatting code. But using classes to de-duplicate this logic may require considerable care to make sure that all of the actual class-dictionaries are specialized away and all of the unboxing we expect actually happens. (Backpack isn't really an option for us; abusing CPP is another possibility.) I don't expect the cost/benefit to be favorable unless you are rewriting large chunks, but if you want to work on this I won't stop you.
(2) sounds sensible.
(3), (5) I can see some value in selecting between 1e40 and 1E40. But Char is not appropriate here: Word8 for a single byte, or perhaps Builder is less ugly, or perhaps a Bool indicating "isUppercase."
(4) is implied by (2).
(6) It's very easy for a user to check for these and format them specially if they have some particular reason to. It's probably not worth expanding our API for.
(7) These are not duplicates because of 0.0e0 versus 0.0.
(8) GHC is usually pretty cavalier about eta-expanding through such case statements anyway. (This can hurt when it results in a real PAP that is called several times but I expect that to be very uncommon for formatDouble.)

(Sorry for taking so long to respond.)

BebeSparkelSparkel commented 8 months ago

Builder is an intriguing idea instead of char. I'll have to think about that since it does allow for more than one Word8 to be inserted.

I think 6 and 7 are somewhat related and it seems worth it to consolidate for the faster determination of special values for the standard case.

BebeSparkelSparkel commented 8 months ago

@Bodigrim @clyring I am having a performance problem with trying to consolidate the special printers. When I apply the changes in commit the performance for printing NaN, Infinity, and 0 decreases by 20-70%. Do you have any insight as to why this is?

Baseline profile created from branch realfloat-bench with cabal bench --benchmark-options '--csv baseline.csv -p /RealFloat/'

Compared to baseline on branch consolidate-special-real-number-printers with cabal bench --benchmark-options '--baseline baseline.csv -p "/FSci/ && /Special/ && $NF ~ /Float/"'

clyring commented 8 months ago

I see you're doing a lot of work in various PRs in this area. Please let me know when each one is ready for review, and which ones to start with/prioritize.

BebeSparkelSparkel commented 8 months ago

@clyring Please focus on

638 -- Adds tests that will fail without #637
637 -- the big one

The others can come later.

haskell / bytestring

Overhaul Data.ByteString.Builder.RealFloat #636

632

633

634

638 -- Adds tests that will fail without #637

637 -- the big one