Make `format_digits` skip trailing zeroes

odin-lang / Odin

Odin Programming Language

https://odin-lang.org

BSD 3-Clause "New" or "Revised" License

6.13k stars 551 forks source link

Make `format_digits` skip trailing zeroes #3847

Closed karl-zylinski closed 5 days ago

karl-zylinski commented 6 days ago

builder.write_f64(&b, 0.123, 'f') it will now output 0.123 instead of 0.1230000000000000.

Also it will will output 0 instead of 0.0000000000000000 and 1 instead of 1.0000000000000000.

This will also affect JSON output, avoiding trailing zeroes there too.

Below is a small test program I used to check how it behaved. There's a comment on each line with how the output looks, with a comparison to how it looked before.

package f64_trim

import "core:strings"
import "core:fmt"
import "core:io"

main :: proc() {
    b := strings.builder_make()
    strings.write_f64(&b, 0.123, 'f') // 0.123 before: 0.1230000000000000
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 1.02, 'f') // 1.02 before: 1.0200000000000000
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 0.0004, 'f') // 0.0004 before: 0.0004000000000000
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 123.456, 'f') // 123.4560000000000031 before: 123.4560000000000031
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 1.0, 'f') // 1 before: 1.0000000000000000
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 0.0, 'f') // 0 before: 0.0000000000000000
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 0.123, 'e') // 1.23e-01 before: 1.2300000000000000e-01
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 123.456, 'e') // 1.23456e+02 before: 1.2345600000000000e+02
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 0.00123, 'e') // 1.23e-03 before: 1.2300000000000000e-03
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 1.0, 'e') // 1e+00 before: 1.0000000000000000e+00
    strings.write_string(&b, "\n")
    strings.write_f64(&b, 0.0, 'e') // 0e+00 before: 0.0000000000000000e+00
    fmt.println(strings.to_string(b))
}

Feoramund commented 6 days ago

Some of the fmt tests depend on the behavior you're changing, which is why they failed.

I noticed this in the logs: (Level/time headers trimmed to make it easier to read.)

[test_core_fmt.odin:142:test_fmt_width_precision()] ("%.3f", [3.14]): "3.14" != "3.140"
[test_core_fmt.odin:143:test_fmt_width_precision()] ("%.5f", [3.14]): "3.14" != "3.14000"

This might be an issue. The explicit precision specified by the format is no longer appearing.

Kelimion commented 6 days ago

Also it will will output 0 instead of 0.0000000000000000 and 1 instead of 1.0000000000000000.

I'm not a fan of representing a float as an int just because the current value happens to round to one. Not unless the format string asked for %.0f.

karl-zylinski commented 6 days ago

Hm, this broke lots of tests, and I realize perhaps it also can create unexpected results in cases where you explicitly write format strings like %6.2f and trailing zeroes vanished.

However, it is not possible to control this 'trailing zero avoidance' when you use for example write_f64, which becomes a real problem when you write JSON files, because they get lots of zeroes everywhere. Perhaps some kind of configuration flag (with a default value) for this on write_f64 etc would be good? Then it can be passed on the other procs and finally to format_digits.

One could think that one just wants to tweak the precision, but it doesn't really do the same thing.

karl-zylinski commented 6 days ago

Also it will will output 0 instead of 0.0000000000000000 and 1 instead of 1.0000000000000000.

I'm not a fan of representing a float as an int just because the current value happens to round to one. Not unless the format string asked for %.0f.

Yeah, I should probably change it. If I wrote floats to JSON I would like a f64 zero to just be written as 0. The JSON package is like a layer between me and the outputting anyways, I do not control the formatting of it. Throwing the dot when all decimals are zero is good for human readable formats I think.

Anyways, I think doing these things automatically is perhaps wrong. But the standard behavior of having

strings.write_f64(&b, 1.0, 'f')

be written as 1.0000000000000000 by default is perhaps not what most people want either.

One can of course use strings.write_float and lower the precision, but I think lots of people want high precision, but not lots of noisy zeroes (again, JSON is a good example).

Ideas for what to do:

Make write_f64 take a parameter with a default value that trims zeroes
Add a setting to the string builder that trims trailing decimal zeroes.
Don't do any of this and just change the JSON code to trim zeroes, since I seem to be hung up on JSON anyways.

Feoramund commented 6 days ago

However, it is not possible to control this 'trailing zero avoidance' when you use for example write_f64, which becomes a real problem when you write JSON files, because they get lots of zeroes everywhere. Perhaps some kind of configuration flag (with a default value) for this on write_f64 etc would be good? Then it can be passed on the other procs and finally to format_digits.

@karl-zylinski The underlying problem you're trying to solve here is JSON data loss, right? Maybe it's better to approach it from the json library first.

I haven't looked into this too deeply yet, but my initial thoughts are: as far as adding a boolean to write_f64 goes, that could slow down float writing because the procs in that line now have an extra possible branch, or it may complicate the API for an edge case.

What about a custom float writer proc just for json? Something that writes floats to a temporary buffer, then trims off the zeroes, then writes that to the intended stream? It would be a little slower, but it would be isolated, and I think it might solve your problem if I'm understanding it so far.

Is the JSON data loss in Odin itself when Odin re-parses a float value, or is this happening in another program?

If the data loss is in Odin, maybe the parser needs to be looked at instead?

karl-zylinski commented 6 days ago

@Feoramund The data loss can happen without JSON involved.

How to make it happen: If you write with write_f64, then lots of zeroes are added. Then parse_f64 (strconv.odin) parses the values, in some cases the extra zeroes leads to a loss of precision that you would not expect. It can happen for simple small number (I ran into this while creating a level editor for my game last year, after saving and loading, then objects in the game moved 🙀).

So going back and forth with write_f64 and parse_f64 a few times like that can cause these issues.

I think you are right, we should look into fixing parse_f64 so it ignores trailing zeroes from the input string, saving some precision.

But also, I think the current default behavior of strings.write_f64(&b, 1.0, 'f') outputting 1.0000000000000000 with no way to trim off those zeroes is perhaps not what one expects either.

I think we could do something like this:

Make parse_f64 better with trailing zeroes.
One of the following to have less trailing zeroes: Make generic_ftoa take a parameter that tells it to trim off trailing zeroes. Then we can make both the Builder and Writer (core/io) have config flags that they can feed into there. Remove trailing zeroes in JSON code, but like I said, I think it's a more general problem.

karl-zylinski commented 5 days ago

This PR https://github.com/odin-lang/Odin/pull/3849 fixes the most critical parts of this issue. The trimming of zeroes in JSON etc isn't that important. Closing this as it is not a good solution.