jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.62k stars 1.58k forks source link

Raw string forms of various sorts? #2856

Open nicowilliams opened 1 year ago

nicowilliams commented 1 year ago

@wader asked for some sort of string form that makes it easy enough to cut-n-paste strings like regexps and such with minimal quoting necessary. Suggestions include:

Whatever we go with has to be something that can be expressed using flex and bison.

wader commented 1 year ago

fq has backtick-raw strings:

$ fq -rn '`\(123)\u1234`'
\(123)\u1234

which is practical as it exploits the fact that they are rarely (at least for now) used text formats like xml, json etc. But maybe a more future-proof and safer choice is something using used-defined start/end somehow

nicowilliams commented 1 year ago

I think I can make <backtick><any-number-of-({[><backtick> be the start sequence with the end sequence having to have the same but with closing paren/brace/bracket characters.

# because the start is `{{` the end has to be `}}`
`{{`raw string here with no \escapes and not \(interpolation)`}}`
wader commented 1 year ago

I do like the simplicity of single backticks. Maybe a variant could be that raw string literals can't be empty and then use <one-or-more-backticks><one-or-more-characters><same-amount-of-backticks>, ex:

`abc` => "abc"

``ab`c`` => "ab`c"

abc


`` # not ok

A bit similar to how markdown code blocks work.

oguz-ismail commented 11 months ago

some sort of string form that makes it easy enough to cut-n-paste strings like regexps and such

regexps and what? Why not /regexp/?

wader commented 11 months ago

Your thinking something like "abc" | /abc/ | ...? that would be quite neat, would also make it possible to reuse the compiler regexp. Would /re/[flags] be like select(test(re; flags)), /re/sub/[flags] like gsub(re; sub) etc also scan/match/capture somehow? would re and flags in those cases need to be literal stings and not filters i guess?

Anyways, my use cases for raw strings with fq has been literal strings that are html, xml, json etc

oguz-ismail commented 11 months ago

@wader Yes, that or introduce a new string type specified using slashes instead of double quotes and can be provided to test, scan, etc. as the regexp argument. Either way it can be compiled once and reused throughout the program.

wader commented 8 months ago

Did a quick experiment with jqjq to see how it would feel like and possible problems https://github.com/wader/jqjq/tree/regex-literals-experiment

Looks like this:

# behaves as:
# select(test("abc")) | capture("(?<digits>\\d+)")

$ ./jqjq -n '"nope123", "abc123" | /abc/ | /(?<digits>\d+)/'
{
  "digits": "123"
}

Some thought and problems i encountered (from commit message):

js does not support empty or new line in /regex/
  // comment in js
  //-alt in jq

how to handle 1 / 2 / 3?

currently /regex/ is compiled like this:
  no (named) capture groups: select(text("regex"))
  named capture groups: capture("regex")

flags just /regex/flags?

gsub via /regex/sub/? sub only string or string with interpolation?

maybe some way of to do test("regxp") without select(..)?

what about match, scan, split etc? via flags or syntax?

support test(/regex/) or /regex/ as $re | test($re) etc?
  behave as string literals with regex side data?
wader commented 8 months ago

Related issue #1249

oguz-ismail commented 8 months ago

Did a quick experiment with jqjq to see how it would feel like and possible problems https://github.com/wader/jqjq/tree/regex-literals-experiment

Looks like this:

# behaves as:
# select(test("abc")) | capture("(?<digits>\\d+)")

$ ./jqjq -n '"nope123", "abc123" | /abc/ | /(?<digits>\d+)/'
{
  "digits": "123"
}

@wader I like it. My two cents on the rest:

js does not support empty or new line in /regex/
  // comment in js
  //-alt in jq

// can be a shorthand for the most recently matched regex; like

/abc/ | sub(//; "def") | ...

would be the same as

/abc/ | sub(/abc/; "def") | ...

It can match empty string too but split("")[] is already more useful than splits("") and I can't think of another use case.

Line breaks in regex literals would make programs less readable, I doubt anyone would prefer them over \n.

how to handle 1 / 2 / 3?

As division. What would its function be otherwise?

currently /regex/ is compiled like this:
  no (named) capture groups: select(text("regex"))
  named capture groups: capture("regex")

flags just /regex/flags?

Yes. Not sure what the g flag would do though.

gsub via /regex/sub/? sub only string or string with interpolation?

maybe some way of to do test("regxp") without select(..)?

what about match, scan, split etc? via flags or syntax?

support test(/regex/) or /regex/ as $re | test($re) etc?
  behave as string literals with regex side data?

I think it'd suffice if the original regex filters accepted regex literals as argument. Usages like this

def f(re): test(re);
def g($match): .;
"xabcx"
| f(/abc/) # produces true
, g(/abc/) # produces "xabcx"

should be legal too, of course.

wader commented 8 months ago
how to handle 1 / 2 / 3?

As division. What would its function be otherwise?

Yes of course and to clarify, by handle i mostly meant how to handle it with the current lexer/parser implementation. I'm no expert how it currently works but i've gotten the impression that it is quite complex as it is. So in the worst case support something like that might require lots of refactor and in my experience working with grammars the devil is in the details.

currently /regex/ is compiled like this:
  no (named) capture groups: select(text("regex"))
  named capture groups: capture("regex")

flags just /regex/flags?

Yes. Not sure what the g flag would do though.

I think g would make it output all match and for /regex/sub/g it would substitute all matches and output one string

"abc" | /(?<c>.)/g would output {"c": "a"}, {"c": "b"} and {"c": "c"}

"abc" | /(?<c>.)/\(.c),/g would output "a,b,c," that is sub is treated as a string or interpolating string

But i get a feeling there are lots of tricky details here

def f(re): test(re);
def g($match): .;
"xabcx"
| f(/abc/) # produces true
, g(/abc/) # produces "xabcx"

should be legal too, of course.

I was overthinking it about bindings. Yes just making /.../ etc behave as if it a normal filter like selec(test("...")) etc is probably the most straight forward and intuitive. So ex:

$ ./jqjq -c -n '"abc" | /(?<c>b)/ as {$c} | $c'
"b"
iwconfig commented 5 months ago

I'd love to see this! I want to use string interpolations as as arguments for templating.

e.g.

$ jq -n --arg path "`/\(.category)/\(.name)/\(.key) - \(.value)`" '{category: "A", name: "foo", key: "bar", value: "baz"} | $ARGS.named.path'
"/A/foo/bar - baz"

or

$ jq -n '"A" as $category | "foo" as $name | "bar" as $key | "baz" as $value | $ARGS.positional[].settings.output' --jsonargs '{"settings":{"output":{"path":"`/\($category)/\($name)/\($key) - \($value)`"}}}'
{
  "path": "/A/foo/bar - baz"
}
wader commented 5 months ago

@iwconfig This sounds like support for defining new interpolation-templates in runtime somehow and then use raw strings together with that? for ... | $some_string to work like that we probably need to new syntax. But nearly not as fancy as something like that and if you don't need arbitrary expressions in the template string itself, you can do something like this:

$ jq -n --arg path "/\(category)/\(name)/\(key) - \(value)" 'def tmpl($o): gsub("\\\\\\((?<s>.*?)\\)"; $o[.s]); $ARGS.named.path | tmpl({category: "A", name: "foo", key: "bar", value: "baz"})'
"/A/foo/bar - baz"
thecaralice commented 3 months ago

The way Rust handles the issue was not mentioned here: a raw string literal starts with an r then any (possibly zero) amount of #'s, then the string content enclosed in quotes, then the same amount of #'s. The following snippet is taken from Rust By Example:

fn main() {
    let raw_str = r"Escapes don't work here: \x3F \u{211D}";
    println!("{}", raw_str);

    // If you need quotes in a raw string, add a pair of #s
    let quotes = r#"And then I said: "There is no escape!""#;
    println!("{}", quotes);

    // If you need "# in your string, just use more #s in the delimiter.
    // You can use up to 255 #s.
    let longer_delimiter = r###"A string with "# in it. And even "##!"###;
    println!("{}", longer_delimiter);
}