Open JesseTG opened 6 years ago
Since jq supports multi-arity functions, you would probably want to add functions with arities that are not already defined, e.g. tonumber(base).
See also https://rosettacode.org/wiki/Non-decimal_radices/Convert#jq
Okay, so I'll write a new built-in but make it look like tonumber
. Is that okay?
Is that okay.
First, please understand that it is not for me to decide what is added to jq. Second, please note that there is a queue of worthy Pull Requests that have not yet made it into jq, so if you are proposing a new Pull Request, please be aware that it will probably join the queue. Third, I believe it would probably be worth your while to specify more precisely what you propose.
First, please understand that it is not for me to decide what is added to jq.
Right, excuse me.
Second, please note that there is a queue of worthy Pull Requests that have not yet made it into jq, so if you are proposing a new Pull Request, please be aware that it will probably join the queue.
That's fine.
Third, I believe it would probably be worth your while to specify more precisely what you propose.
A new filter (or a new variety of tonumber
) that behaves like this:
$ echo \"0x40\" | jq tonumber
64
$ echo \"0755\" | jq tonumber
493
$ echo \"40\" | jq tonumber(16)
64
$ echo \"40\" | jq tonumber(7) # base 7
28
My main use case for this would be converting string representations of colors to integers. I technically only need hex -> decimal conversions, but given that this would basically be a rapper around strtol
you'd be getting the other bases for free.
For the sake of completeness, I'd also provide the inverse:
$ echo 64 | jq hex
"0x40"
$ echo 493 | jq oct
"0755"
$ echo 64 | jq base(16)
"0x40"
$ echo 28 | jq base(7)
"28"
I would probably implement this as a C built-in that wraps printf
, with a set of filters in the standard library that use that builtin.
The major problem that I see is that your proposal introduces a backward incompatibility, because currently:
echo '"0755"' | jq tonumber
755
From wikipedia:
In programming languages, octal literals are typically identified with a variety of prefixes, including the digit 0, the letters o or q, the digit–letter combination 0o, or the symbol & or $.
The maintainers are also currently concerned about the absolute number of builtin.jq builtins, for performance reasons. In your initial PR, you may therefore want to keep the number of additional such builtins to a bare minimum.
The major problem that I see is that your proposal introduces a backward incompatibility, because currently:
echo '"0755"' | jq tonumber 755
Okay, so I'd leave 0
prefixes alone except on an opt-in basis (maybe if you're explicitly asking for base 8). Man, whoever coined that prefix for octal needs to be smacked.
The maintainers are also currently concerned about the absolute number of builtin.jq builtins, for performance reasons. In your initial PR, you may therefore want to keep the number of additional such builtins to a bare minimum.
That's fine. In fact, it looks like I'm suggesting more than I really am. I'm only suggesting two C builtins (let's call them tonumber/1
and base/1
) and three trivial jq
builtins that wrap common cases. In fact, here's what the jq
builtins would look like:
def hex: base(16);
def oct: base(8);
def bin: base(2);
Maybe add one more of each, depending on how I use sprintf
.
Man, whoever coined that prefix for octal needs to be smacked.
You are not wrong.
That's fine. In fact, it looks like I'm suggesting more than I really am. I'm only suggesting two C builtins (let's call them
tonumber/1
andbase/1
) and three trivialjq
builtins that wrap common cases.
That's... still 5 builtins from the perspective we're worried about. If I were to do this as you've proposed, I'd just implement the tonumber/1
and base/1
, and leave it to people to define shortcut builtins if they need it. Linking is currently something like O(n^2)
, so we like to avoid adding more builtins than are necessary.
Relatedly, I'm not in love with the base/1
name. I think it's a little unclear, but I don't have a recommendation on something better...
On a more general note, I should point out that tonumber/0
takes a string which is interpreted as a JSON-encoded representation of a number, which includes such things as -5e+77
and 5e-77
. How would these interact with tonumber(8)
?
Linking is currently something like O(n^2), so we like to avoid adding more builtins than are necessary.
Where n
is the number of built-ins written in jq
? Yikes. Why is this?
Relatedly, I'm not in love with the
base/1
name. I think it's a little unclear, but I don't have a recommendation on something better...
radix/1
, maybe?
On a more general note, I should point out that
tonumber/0
takes a string which is interpreted as a JSON-encoded representation of a number, which includes such things as-5e+77
and5e-77
. How would these interact withtonumber(8)
?
Here are my thoughts.
Typically when you're dealing with numbers in multiple bases, you know in advance which ones you're using. So I think the argument to tonumber
would usually be a constant in practice. Given that, I think it's okay to be liberal with special cases. If you really want it, I can add a flag that toggles special cases on a per-call basis.
It doesn't matter what the representation is, just the value. 1.2e3
and 1200
are both twelve hundred, which is an integer. Any integer between -(2**53)
and (2**53) - 1
(inclusive) can be properly represented in a double
.
Non-integers are tricky. The only thing I can really think of is to consider non-integer inputs to base/1
an error, with these exceptions:
1.5 | base(16)
or "0x1.8p+0" | tonumber
or would be sensible.base(10)
equivalent to tostring/1
(for numbers), regardless of whether or not the input is an integer or a float?tonumber(8)
, then allow the 0
prefix. Otherwise, just consider it a garden-variety leading zero (i.e. don't use it to implicitly convert from octal).This would be very useful! Has there been any work done?
A few suggestions:
0o
prefix, and binary can use 0b
. This would match up nicely with the hexadecimal prefix.0x1.8
is equal to 1.5
. Implementing e<pwr>
for all bases raises a lot of questions to be simple, and so it could simply be not supported (e.g: What base is <pwr>
in? Does it use base 10 (as some people expect) or the current base? What base does the actual multiplier use? (i.e 10**<pwr>
or <base>**<pwr>
) etc. etc.)radix
is not a bad name.Any word on this one? Very excited to be able to convert hex strings to numbers!
will the implementation strip out/ignore 0x
prefix instead of error?
Thanks.
I'd like to be able to convert numeric strings in bases besides 10 (e.g.
"0xdeadbeef"
or"0755"
) to numbers. I don't need to convert literals, just string values. There's no good way to do this injq
right now, so I'm going to write a built-in soon.But where would you want it? I could either modify the
tonumber
builtin or write a new one that wrapsstrtoul
. Which would you prefer?