Open nicowilliams opened 1 year ago
fq has backtick-raw strings:
$ fq -rn '`\(123)\u1234`'
\(123)\u1234
which is practical as it exploits the fact that they are rarely (at least for now) used text formats like xml, json etc. But maybe a more future-proof and safer choice is something using used-defined start/end somehow
I think I can make <backtick><any-number-of-({[><backtick>
be the start sequence with the end sequence having to have the same but with closing paren/brace/bracket characters.
# because the start is `{{` the end has to be `}}`
`{{`raw string here with no \escapes and not \(interpolation)`}}`
I do like the simplicity of single backticks. Maybe a variant could be that raw string literals can't be empty and then use <one-or-more-backticks><one-or-more-characters><same-amount-of-backticks>
, ex:
`abc` => "abc"
``ab`c`` => "ab`c"
abc
`` # not ok
A bit similar to how markdown code blocks work.
some sort of string form that makes it easy enough to cut-n-paste strings like regexps and such
regexps and what? Why not /regexp/
?
Your thinking something like "abc" | /abc/ | ...
? that would be quite neat, would also make it possible to reuse the compiler regexp. Would /re/[flags]
be like select(test(re; flags))
, /re/sub/[flags]
like gsub(re; sub)
etc also scan/match/capture somehow? would re and flags in those cases need to be literal stings and not filters i guess?
Anyways, my use cases for raw strings with fq has been literal strings that are html, xml, json etc
@wader Yes, that or introduce a new string type specified using slashes instead of double quotes and can be provided to test, scan, etc. as the regexp argument. Either way it can be compiled once and reused throughout the program.
Did a quick experiment with jqjq to see how it would feel like and possible problems https://github.com/wader/jqjq/tree/regex-literals-experiment
Looks like this:
# behaves as:
# select(test("abc")) | capture("(?<digits>\\d+)")
$ ./jqjq -n '"nope123", "abc123" | /abc/ | /(?<digits>\d+)/'
{
"digits": "123"
}
Some thought and problems i encountered (from commit message):
js does not support empty or new line in /regex/
// comment in js
//-alt in jq
how to handle 1 / 2 / 3?
currently /regex/ is compiled like this:
no (named) capture groups: select(text("regex"))
named capture groups: capture("regex")
flags just /regex/flags?
gsub via /regex/sub/? sub only string or string with interpolation?
maybe some way of to do test("regxp") without select(..)?
what about match, scan, split etc? via flags or syntax?
support test(/regex/) or /regex/ as $re | test($re) etc?
behave as string literals with regex side data?
Related issue #1249
Did a quick experiment with jqjq to see how it would feel like and possible problems https://github.com/wader/jqjq/tree/regex-literals-experiment
Looks like this:
# behaves as: # select(test("abc")) | capture("(?<digits>\\d+)") $ ./jqjq -n '"nope123", "abc123" | /abc/ | /(?<digits>\d+)/' { "digits": "123" }
@wader I like it. My two cents on the rest:
js does not support empty or new line in /regex/ // comment in js //-alt in jq
//
can be a shorthand for the most recently matched regex; like
/abc/ | sub(//; "def") | ...
would be the same as
/abc/ | sub(/abc/; "def") | ...
It can match empty string too but split("")[]
is already more useful than splits("")
and I can't think of another use case.
Line breaks in regex literals would make programs less readable, I doubt anyone would prefer them over \n
.
how to handle 1 / 2 / 3?
As division. What would its function be otherwise?
currently /regex/ is compiled like this: no (named) capture groups: select(text("regex")) named capture groups: capture("regex") flags just /regex/flags?
Yes. Not sure what the g
flag would do though.
gsub via /regex/sub/? sub only string or string with interpolation? maybe some way of to do test("regxp") without select(..)? what about match, scan, split etc? via flags or syntax? support test(/regex/) or /regex/ as $re | test($re) etc? behave as string literals with regex side data?
I think it'd suffice if the original regex filters accepted regex literals as argument. Usages like this
def f(re): test(re);
def g($match): .;
"xabcx"
| f(/abc/) # produces true
, g(/abc/) # produces "xabcx"
should be legal too, of course.
how to handle 1 / 2 / 3?
As division. What would its function be otherwise?
Yes of course and to clarify, by handle i mostly meant how to handle it with the current lexer/parser implementation. I'm no expert how it currently works but i've gotten the impression that it is quite complex as it is. So in the worst case support something like that might require lots of refactor and in my experience working with grammars the devil is in the details.
currently /regex/ is compiled like this: no (named) capture groups: select(text("regex")) named capture groups: capture("regex") flags just /regex/flags?
Yes. Not sure what the
g
flag would do though.
I think g
would make it output all match and for /regex/sub/g
it would substitute all matches and output one string
"abc" | /(?<c>.)/g
would output {"c": "a"}
, {"c": "b"}
and {"c": "c"}
"abc" | /(?<c>.)/\(.c),/g
would output "a,b,c,"
that is sub
is treated as a string or interpolating string
But i get a feeling there are lots of tricky details here
def f(re): test(re); def g($match): .; "xabcx" | f(/abc/) # produces true , g(/abc/) # produces "xabcx"
should be legal too, of course.
I was overthinking it about bindings. Yes just making /.../
etc behave as if it a normal filter like selec(test("..."))
etc is probably the most straight forward and intuitive. So ex:
$ ./jqjq -c -n '"abc" | /(?<c>b)/ as {$c} | $c'
"b"
I'd love to see this! I want to use string interpolations as as arguments for templating.
e.g.
$ jq -n --arg path "`/\(.category)/\(.name)/\(.key) - \(.value)`" '{category: "A", name: "foo", key: "bar", value: "baz"} | $ARGS.named.path'
"/A/foo/bar - baz"
or
$ jq -n '"A" as $category | "foo" as $name | "bar" as $key | "baz" as $value | $ARGS.positional[].settings.output' --jsonargs '{"settings":{"output":{"path":"`/\($category)/\($name)/\($key) - \($value)`"}}}'
{
"path": "/A/foo/bar - baz"
}
@iwconfig This sounds like support for defining new interpolation-templates in runtime somehow and then use raw strings together with that? for ... | $some_string
to work like that we probably need to new syntax. But nearly not as fancy as something like that and if you don't need arbitrary expressions in the template string itself, you can do something like this:
$ jq -n --arg path "/\(category)/\(name)/\(key) - \(value)" 'def tmpl($o): gsub("\\\\\\((?<s>.*?)\\)"; $o[.s]); $ARGS.named.path | tmpl({category: "A", name: "foo", key: "bar", value: "baz"})'
"/A/foo/bar - baz"
The way Rust handles the issue was not mentioned here: a raw string literal starts with an r
then any (possibly zero) amount of #
's, then the string content enclosed in quotes, then the same amount of #
's. The following snippet is taken from Rust By Example:
fn main() {
let raw_str = r"Escapes don't work here: \x3F \u{211D}";
println!("{}", raw_str);
// If you need quotes in a raw string, add a pair of #s
let quotes = r#"And then I said: "There is no escape!""#;
println!("{}", quotes);
// If you need "# in your string, just use more #s in the delimiter.
// You can use up to 255 #s.
let longer_delimiter = r###"A string with "# in it. And even "##!"###;
println!("{}", longer_delimiter);
}
@wader asked for some sort of string form that makes it easy enough to cut-n-paste strings like regexps and such with minimal quoting necessary. Suggestions include:
${{{"...}}}"
with as many repetitions of{
/}
as needed to avoid quoting problemsWhatever we go with has to be something that can be expressed using
flex
andbison
.