red / REP

Red Enhancement Process
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

WISH: mold integer to automatically add thousand separators #101

Open hiiamboris opened 3 years ago

hiiamboris commented 3 years ago

I don't like seeing this for example:

>> stats
== 1646342704

== 1'646'342'704 takes thousands of times less effort to process visually

For code that relies on it having digits only, I propose mold/all would still produce the machine-friendly variant.

rebolek commented 3 years ago

I think this is more suitable for form than mold.

Dne čt 18. 3. 2021 16:30 uživatel hiiamboris @.***> napsal:

I don't like seeing this for example:

stats == 1646342704

== 1'646'342'704 takes thousands of times less effort to process visually

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/red/REP/issues/101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBGK2DNC4IWV5QPDVNV2TTEIMARANCNFSM4ZM3A7IQ .

greggirwin commented 3 years ago

I already have a few paragraphs, examples, and profiling notes written, to discuss this with @dockimbel. Money! does it already for form. Good thought on mold/all to avoid them. A couple notes for your comments:

MOLD could use the Red separator, and FORM could use [, .] as seps by default, with a system option to use them in reverse for locales that use them that way.

FORM doesn't guarantee loadability, but MOLD does. So it makes sense that mold would use Red's group separator.

It doesn't affect transcode loading speed much at all. Money takes a ~17% hit when forming.

The big question is if, and why, we would ever NOT want to format numbers with separators from a usability perspective. If it's only about performance, we're trading computer time for human time, and how much extra time does it take a person to suss out numbers, and what is the potential impact for getting it wrong?

If numbers format automatically, in this basic form, that's the majority case for numbers with format. Doing it in R/S makes that common case fast, and avoids the extra format call altogether. Format is still useful for special formatting cases.

greggirwin commented 3 years ago

There is still a central tension. Console output and other things like rejoin use form. If that produces the human friendly output, it's great for end user apps, but not great for all development work including data interchange, where we want Red loadable format. The tension is that we have two major use cases (human and dev), and want both of them to be automatic in their given context.

If both form and mold use Red format, that keeps the code simpler. It does mean you need to make a call to get the non-dev format, but then that work can be fast and easy because you're replacing single characters and the size doesn't change.

greggirwin commented 3 years ago

to string! could also omit separators.

hiiamboris commented 3 years ago

IMO form doesn't have to be locale-aware, if format is. Everything else - totally agreed. Good thoughts.

greggirwin commented 3 years ago

Agreed on form.

rebolek commented 3 years ago

why, we would ever NOT want to format numbers with separators

Because if we always add separators to integers, we would need to introduce function to remove them from the resulting string, so it can be used in CSV, JSON, etc.

Dne čt 18. 3. 2021 19:23 uživatel Gregg Irwin @.***> napsal:

I already have a few paragraphs, examples, and profiling notes written, to discuss this with @dockimbel https://github.com/dockimbel. Money! does it already for form. Good thought on mold/all to avoid them. A couple notes for your comments:

MOLD could use the Red separator, and FORM could use [, .] as seps by default, with a system option to use them in reverse for locales that use them that way.

FORM doesn't guarantee loadability, but MOLD does. So it makes sense that mold would use Red's group separator.

It doesn't affect transcode loading speed much at all. Money takes a ~17% hit when forming.

The big question is if, and why, we would ever NOT want to format numbers with separators from a usability perspective. If it's only about performance, we're trading computer time for human time, and how much extra time does it take a person to suss out numbers, and what is the potential impact for getting it wrong?

If numbers format automatically, in this basic form, that's the majority case for numbers with format. Doing it in R/S makes that common case fast, and avoids the extra format call altogether. Format is still useful for special formatting cases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/red/REP/issues/101#issuecomment-802183101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBGK2CGJ7ZQ3775CC3HCLTEJAJVANCNFSM4ZM3A7IQ .

greggirwin commented 3 years ago

Good point. That's what to string! and/or mold/all would do. I do think we need an easy option for that.

endo64 commented 3 years ago

I think when we have format available, we can leave to use it to developer, because during the execution of a code, no need to format numbers, only when outputting. But adding formatted output to the console would be nice and it can be configurable, could be added to system/options as well.

greggirwin commented 3 years ago

Yes, this would be for output only, no change to internal representations of numbers.

greggirwin commented 3 years ago

It would also not affect redbin, because that's not human readable.

greggirwin commented 3 years ago
    a: [form $1234567980.12]
    b: [mold $1234567980.12]
    c: [form 1234567980]
    profile/show/count [a b c] 1'000'000

Note how forming money, with seps, doubles the memory hit. Intuitively it seems like it should be proportional to the number of separators added. Even testing issues, percents, etc. mem use is consistent across them. Looking at integer/form-signed my best guess is that growing the buffer via string/concatenate-literal which calls append-char is the cause.

Another interesting consideration here is that the charset used is known, not just any codepoint, so we can probably avoid that hit as integers and floats do. I think @9214 did the right thing in the first cut that way, but if we decide to make this the norm, integer/form-signed could probably just use a larger buffer, including the seps. Money then easily adjusts its logic based on that, maybe using money/to-integer rather than looping by digit. Floats will be more work.

greggirwin commented 3 years ago

@dockimbel as we are pressing forward on format and L10N, please weigh in here on with your thoughts on auto-sep inclusion in numbers, so format can be implemented accordingly.

hiiamboris commented 3 years ago

You won't be able to base format on mold or form. As a dirty mockup yes but not as anything reliable.

greggirwin commented 3 years ago

My thought is that a couple common cases should be "almost done" by native formatting, requiring only simple substitutions. e.g., format-number-with-mask round/to n .01 "$#'###'###'###'##0.00" may be to money! n, and changing ' to , can then be fast. But what we may also find is that, as you say, there's no win; but also, importantly, format may be used much less.

dockimbel commented 3 years ago

I think it is safer to simply add a formatting option for integers printed as results in console using a system/options flag. That should already be enough. For a generalized version, I don't see why format is not good enough option anymore? We don't want to make developping Red apps that interact with the rest of the world (including the local operating system) more difficult just because we want a generalized pretty-printing of integers. Expanding form could be considered, but it is used in many places, so could too painful to change.

greggirwin commented 3 years ago

I don't see why format is not good enough option anymore?

Because making numbers easy to read should be the default. print n vs print format n 'r-general (r- is for Ren/Redbol). Even moreso with print [x y z] vs print [format x 'r-general format y 'r-general format z 'r-general] Not just for console output, but logs, saved data, and 'net messages. It's not about pretty printing, it's about HCI vs CCI (computer computer interaction). Format is for when you need more control.

The big question is if, and why, we would ever NOT want to format numbers with separators from a usability perspective.

I said earlier that we have 2 main use cases, but really there are 3:

1) Dev mode. All Red, including console and other tools. 2) User Mode, locale aware. 3) Interop, where other tools don't understand seps at all.

@rebolek's point about standards-based formats is key. Those should be written once, or a few times at most for competing implementations. Of course we don't want omitting separators to be too onerous, because the ad hoc interop cases affect users writing custom code. But which is the majority case? We can't say. What we can say is that if we doc how things work, so when someone needs to omit seps, they may complain, but we are on solid ground to defend our position, and it's not like it will be horribly painful on their end compared to other stuff they have to deal with. :^)

For the CSV codec, there is exactly one call to form that might have to change. The JSON codec uses append on the output string for ints and floats.

But let's say we leave the default as it is. If someone is writing code for any interop scenario, they are already formatting code in a very specific way so the other side can read it. You can't just save in Red and load in a shell script or another lang. But there is an easy compromise here (still a compromise though ;^), which I noted half of on 18-Mar. to string! omits seps. Part 2 of that is appending a number to a string also omits seps. If you're building output for interop, that's probably how you're doing it. However it does mean we need to change rejoin to use to string! rather than form to keep the output the same as it is today.

Integers are used for examples here, but I think it should apply to floats, percents, and money as well. If we update integers, pairs will get it automatically as they use integer/form-signed.

Function Proposed Output Purpose Notes
form 10'000 Returns a user-friendly string representation of a value. - Used by print
- Used by rejoin
mold 10'000 Returns a source format string representation of a value. - Used by write -> simple-io/write
- Used by JSON codec beyond integer/float/percent
mold/all 10'000 Return value in loadable format. For CCI more than HCI. For cases where you want to inspect data directly, rather than using redbin and a decoder. Construction syntax won't be understood by any non-Red tool.
- Used by save/all
mold/only 10000 ? Exclude outer brackets if value is a block. - Handy for simple HCI formats when used with new-line.
- Maps could also use /only to exclude sigils. Other block types may not benefit.
- /only could exclude non-essential decoration.
- used by save
mold/flat 10'000 Exclude all indentation. Could change "indentation" to include "non-essential decoration", but /only is a better fit for that.
mold/part 10'000 See mold
- Used by console
append 10000 Insert/Append with strings is the main question Just casts to red-string! today?
- Used by JSON codec for integer/float/percent
to string! 10000
hiiamboris commented 3 years ago

I would just let to string! 1234 produce "1234" (maybe mold/all as well) but all other (non-/all) versions of mold produce "1'234" as well as form. mold's counterpart is load, and since load can read "1'234", it's safe for mold to produce it. I object against differentiating mold from mold/flat and mold/only further as that would only add complexity and gotchas. mold/flat/part/only in particular is useful for human-readable dumps of short code, like in reactivity and tracing.