Open hiiamboris opened 3 years ago
I think this is more suitable for form
than mold
.
Dne čt 18. 3. 2021 16:30 uživatel hiiamboris @.***> napsal:
I don't like seeing this for example:
stats == 1646342704
== 1'646'342'704 takes thousands of times less effort to process visually
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/red/REP/issues/101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBGK2DNC4IWV5QPDVNV2TTEIMARANCNFSM4ZM3A7IQ .
I already have a few paragraphs, examples, and profiling notes written, to discuss this with @dockimbel. Money!
does it already for form
. Good thought on mold/all
to avoid them. A couple notes for your comments:
MOLD could use the Red separator, and FORM could use [, .]
as seps by default, with a system option to use them in reverse for locales that use them that way.
FORM doesn't guarantee loadability, but MOLD does. So it makes sense that mold would use Red's group separator.
It doesn't affect transcode
loading speed much at all. Money takes a ~17% hit when forming.
The big question is if, and why, we would ever NOT want to format numbers with separators from a usability perspective. If it's only about performance, we're trading computer time for human time, and how much extra time does it take a person to suss out numbers, and what is the potential impact for getting it wrong?
If numbers format automatically, in this basic form, that's the majority case for numbers with format
. Doing it in R/S makes that common case fast, and avoids the extra format
call altogether. Format
is still useful for special formatting cases.
There is still a central tension. Console output and other things like rejoin
use form
. If that produces the human friendly output, it's great for end user apps, but not great for all development work including data interchange, where we want Red loadable format. The tension is that we have two major use cases (human and dev), and want both of them to be automatic in their given context.
If both form
and mold
use Red format, that keeps the code simpler. It does mean you need to make a call to get the non-dev format, but then that work can be fast and easy because you're replacing single characters and the size doesn't change.
to string!
could also omit separators.
IMO form
doesn't have to be locale-aware, if format
is. Everything else - totally agreed. Good thoughts.
Agreed on form.
why, we would ever NOT want to format numbers with separators
Because if we always add separators to integers, we would need to introduce function to remove them from the resulting string, so it can be used in CSV, JSON, etc.
Dne čt 18. 3. 2021 19:23 uživatel Gregg Irwin @.***> napsal:
I already have a few paragraphs, examples, and profiling notes written, to discuss this with @dockimbel https://github.com/dockimbel. Money! does it already for form. Good thought on mold/all to avoid them. A couple notes for your comments:
MOLD could use the Red separator, and FORM could use [, .] as seps by default, with a system option to use them in reverse for locales that use them that way.
FORM doesn't guarantee loadability, but MOLD does. So it makes sense that mold would use Red's group separator.
It doesn't affect transcode loading speed much at all. Money takes a ~17% hit when forming.
The big question is if, and why, we would ever NOT want to format numbers with separators from a usability perspective. If it's only about performance, we're trading computer time for human time, and how much extra time does it take a person to suss out numbers, and what is the potential impact for getting it wrong?
If numbers format automatically, in this basic form, that's the majority case for numbers with format. Doing it in R/S makes that common case fast, and avoids the extra format call altogether. Format is still useful for special formatting cases.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/red/REP/issues/101#issuecomment-802183101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBGK2CGJ7ZQ3775CC3HCLTEJAJVANCNFSM4ZM3A7IQ .
Good point. That's what to string!
and/or mold/all
would do. I do think we need an easy option for that.
I think when we have format
available, we can leave to use it to developer, because during the execution of a code, no need to format numbers, only when outputting.
But adding formatted output to the console would be nice and it can be configurable, could be added to system/options
as well.
Yes, this would be for output only, no change to internal representations of numbers.
It would also not affect redbin, because that's not human readable.
a: [form $1234567980.12]
b: [mold $1234567980.12]
c: [form 1234567980]
profile/show/count [a b c] 1'000'000
Note how forming money, with seps, doubles the memory hit. Intuitively it seems like it should be proportional to the number of separators added. Even testing issues, percents, etc. mem use is consistent across them. Looking at integer/form-signed
my best guess is that growing the buffer via string/concatenate-literal
which calls append-char
is the cause.
Another interesting consideration here is that the charset used is known, not just any codepoint, so we can probably avoid that hit as integers and floats do. I think @9214 did the right thing in the first cut that way, but if we decide to make this the norm, integer/form-signed
could probably just use a larger buffer, including the seps. Money then easily adjusts its logic based on that, maybe using money/to-integer
rather than looping by digit. Floats will be more work.
@dockimbel as we are pressing forward on format
and L10N, please weigh in here on with your thoughts on auto-sep inclusion in numbers, so format
can be implemented accordingly.
You won't be able to base format
on mold
or form
. As a dirty mockup yes but not as anything reliable.
My thought is that a couple common cases should be "almost done" by native formatting, requiring only simple substitutions. e.g., format-number-with-mask round/to n .01 "$#'###'###'###'##0.00"
may be to money! n
, and changing '
to ,
can then be fast. But what we may also find is that, as you say, there's no win; but also, importantly, format
may be used much less.
I think it is safer to simply add a formatting option for integers printed as results in console using a system/options
flag. That should already be enough. For a generalized version, I don't see why format
is not good enough option anymore? We don't want to make developping Red apps that interact with the rest of the world (including the local operating system) more difficult just because we want a generalized pretty-printing of integers. Expanding form
could be considered, but it is used in many places, so could too painful to change.
I don't see why format is not good enough option anymore?
Because making numbers easy to read should be the default. print n
vs print format n 'r-general
(r-
is for Ren/Redbol). Even moreso with print [x y z]
vs print [format x 'r-general format y 'r-general format z 'r-general]
Not just for console output, but logs, saved data, and 'net messages. It's not about pretty printing, it's about HCI vs CCI (computer computer interaction). Format
is for when you need more control.
The big question is if, and why, we would ever NOT want to format numbers with separators from a usability perspective.
I said earlier that we have 2 main use cases, but really there are 3:
1) Dev mode. All Red, including console and other tools. 2) User Mode, locale aware. 3) Interop, where other tools don't understand seps at all.
@rebolek's point about standards-based formats is key. Those should be written once, or a few times at most for competing implementations. Of course we don't want omitting separators to be too onerous, because the ad hoc interop cases affect users writing custom code. But which is the majority case? We can't say. What we can say is that if we doc how things work, so when someone needs to omit seps, they may complain, but we are on solid ground to defend our position, and it's not like it will be horribly painful on their end compared to other stuff they have to deal with. :^)
For the CSV codec, there is exactly one call to form
that might have to change. The JSON codec uses append
on the output string for ints and floats.
But let's say we leave the default as it is. If someone is writing code for any interop scenario, they are already formatting code in a very specific way so the other side can read it. You can't just save
in Red and load
in a shell script or another lang. But there is an easy compromise here (still a compromise though ;^), which I noted half of on 18-Mar. to string!
omits seps. Part 2 of that is append
ing a number to a string also omits seps. If you're building output for interop, that's probably how you're doing it. However it does mean we need to change rejoin
to use to string!
rather than form
to keep the output the same as it is today.
Integers are used for examples here, but I think it should apply to floats, percents, and money as well. If we update integers, pairs will get it automatically as they use integer/form-signed
.
Function | Proposed Output | Purpose | Notes |
---|---|---|---|
form | 10'000 | Returns a user-friendly string representation of a value. | - Used by print - Used by rejoin |
mold | 10'000 | Returns a source format string representation of a value. | - Used by write -> simple-io/write - Used by JSON codec beyond integer/float/percent |
mold/all | 10'000 | Return value in loadable format. | For CCI more than HCI. For cases where you want to inspect data directly, rather than using redbin and a decoder. Construction syntax won't be understood by any non-Red tool. - Used by save/all |
mold/only | 10000 ? | Exclude outer brackets if value is a block. | - Handy for simple HCI formats when used with new-line . - Maps could also use /only to exclude sigils. Other block types may not benefit. - /only could exclude non-essential decoration. - used by save |
mold/flat | 10'000 | Exclude all indentation. | Could change "indentation" to include "non-essential decoration", but /only is a better fit for that. |
mold/part | 10'000 | See mold - Used by console |
|
append | 10000 | Insert/Append with strings is the main question | Just casts to red-string! today? - Used by JSON codec for integer/float/percent |
to string! | 10000 | ||
I would just let to string! 1234
produce "1234"
(maybe mold/all
as well) but all other (non-/all
) versions of mold
produce "1'234"
as well as form
.
mold
's counterpart is load
, and since load
can read "1'234"
, it's safe for mold
to produce it.
I object against differentiating mold
from mold/flat
and mold/only
further as that would only add complexity and gotchas. mold/flat/part/only
in particular is useful for human-readable dumps of short code, like in reactivity and tracing.
I don't like seeing this for example:
== 1'646'342'704
takes thousands of times less effort to process visuallyFor code that relies on it having digits only, I propose
mold/all
would still produce the machine-friendly variant.