xexyl / jparse

JSON parser, library and tools written in C
3 stars 3 forks source link

Bug: jstrencode(1) man page has incorrect information about -Q #11

Closed lcn2 closed 2 months ago

lcn2 commented 2 months ago

Is there an existing issue for this?

Describe the bug

The jstrencode(1) man page says about -Q:

       -Q     Enclose output in quotes (def: do not).

This is incorrect.

What -Q does for jstrencode(1) is that IF the string both starts with and ends with a double quote, (as if it is a JSON string), then those enclosing double quotes MUST be ignored.

$ jstrencode 'hi'
hi

$ jstrencode '"hi"'
\"hi\"

$ jstrencode -Q 'hi'
hi

$ jstrencode -Q '"hi"'
hi

$ jstrencode -Q '"hi'
\"hi

$ jstrencode -Q 'hi"'
hi\"

All of the above is correct behavior for jstrencode(1).

What you expect

The jstrencode(1) man page indicates that IFF (if and only if) a string starts and ends in double quotes, the enclosing double quotes MUST be ignored.

This applies to any string argument. For example here are two strings:

$ jstrencode -Q '"hi"' '"hello"'
hihello

This seems like correct behavior.

Also this seems like correct behavior:

$ jstrencode -Q '"1"' '"2' '3"' '"4"'
1\"23\"4

A proposed new option for jstrencode

Although it could be argued that the above 2 examples are incorrect. One could argue that when there are 2 or more arguments, one could first concatenate all arguments and then if the -Q option was given, only ignore ignore double-quotes ONLY if they surround the concatenation of the args. We can see both behaviors.

NOTE: Perhaps the jstrencode -Q should ignore ONLY enclosing double-quotes around the concatenation of all args. If one wanted to remove enclosing double-quotes around EACH argument, use a new option such as -m:

$ jstrencode -h
...
    -Q    ignore double-quotes that surround the concatenation of all strings (def: encode all bytes)
    -m    ignore enclosing double-quotes for each string (def: encode all bytes)

So we would have:

$ jstrencode -m '"hi"' '"hello"'
hihello

And:

$ jstrencode -Q '"hi' 'hello"'
hihello

And one could combine both -Q and -m:

$ jstrencode -Q -m '""hi"' '"hello""'
hihello

See GH-issuecomment-2337761412 for how these same 2 options would apply to jstrdecode(1).

Environment

- OS: non-windoz
- Device: yes
- Compiler: clang is a nice one :-)

bug_report.sh output

n/a

Anything else?

FYI: The usage message for jstrencode(1) is correct:

$ jstrencode -h
...
    -Q      ignore enclosing "'s (def: encode all bytes)

Although the usage message could be worded better. See the above "proposal for a new option" for some better wording.

You might wish to use that language in the jstrencode(1) man page.

lcn2 commented 2 months ago

See also GH-issuecomment-2337677465.

xexyl commented 2 months ago

I wonder how this happened. Thanks for bringing it up! The options and the other comments that have to be read I might be too tired to look at right now but at least the documentation one can be easily fixed.

xexyl commented 2 months ago

As for -m I suggest it's -q instead as it has to do with quotes too. I'm trying to look at this but as noted I'm quite tired, more than I should expect.

Your bug report made me realise there was a mistake in the bug report template too so I fixed that.

lcn2 commented 2 months ago

As for -m I suggest it's -q instead as it has to do with quotes too. I'm trying to look at this but as noted I'm quite tired, more than I should expect.

Your bug report made me realise there was a mistake in the bug report template too so I fixed that.

Sure. However there is already a -q flag used by something else. The -m was for multiple, as in multiple quotes. If you want to pick a different option letter that is otherwise unused, feel free to do so.

lcn2 commented 2 months ago

FYI: See GH-issuecomment-2339392055 for a remark on priories.

lcn2 commented 2 months ago

We believe we have addressed all of the current questions that still need answering at this time. If we've missed something or something else needs to be clarified, please ask again.

xexyl commented 2 months ago

As for -m I suggest it's -q instead as it has to do with quotes too. I'm trying to look at this but as noted I'm quite tired, more than I should expect. Your bug report made me realise there was a mistake in the bug report template too so I fixed that.

Sure. However there is already a -q flag used by something else. The -m was for multiple, as in multiple quotes. If you want to pick a different option letter that is otherwise unused, feel free to do so.

Yup ... I forgot that. Noted it in another bug report. Decided on -e for escaped quotes.

xexyl commented 2 months ago

I believe that the jstrdecode code is fixed (not the decoding bug in I think it was #13 but the -Q bug and the new option -e) so I hope that next I can look at this one. I will be leaving for the day but perhaps I can address this tomorrow. I will leave that to your judgement of course. I hope everything is okay with your family (medical problems etc. which I guess might be one of the reasons for no activity) - and you of course too!

Anyway, it would be nice if these two issues can be resolved so that we can focus on the more serious one.

lcn2 commented 2 months ago

Thank you and we recommend jparse repo PR 15 as a follow-up.

xexyl commented 2 months ago

Thank you and we recommend jparse repo PR 15 as a follow-up.

Thank you! Merged that a short bit ago.

xexyl commented 2 months ago

That being said I've not finished this issue: the new proposed option was not added yet so I'll reopen this issue to more easily find it. I'm hoping I can look at it more soon.

xexyl commented 2 months ago

Hmm ... with the first comment suggesting that the behaviour of -Q is one way and the suggested update for it I am not sure now what you want. Would you please clarify? I guess (and I'll try looking at this later on today) what you're doing is suggesting that its behaviour is changed and so the previously correct behaviour needs to be modified; the new option will change how things work as well.

xexyl commented 2 months ago

Okay I see that actually we have to modify the functions in json_parse.c which means new booleans. I have to go afk a while but when back I'll hopefully start looking at that before I go take care of the other things I have to do. I also would like to work on the prep script for both repos where it checks for the tools being available. That would be nice to get done. But that is lower priority than these I guess.

I hope everything is okay with your family now! Back in a while.

xexyl commented 2 months ago

Reading the option proposal...

A proposed new option for jstrencode

Although it could be argued that the above 2 examples are incorrect. One could argue that when there are 2 or more arguments, one could first concatenate all arguments and then if the -Q option was given, only ignore ignore double-quotes ONLY if they surround the concatenation of the args. We can see both behaviors.

NOTE: Perhaps the jstrencode -Q should ignore ONLY enclosing double-quotes around the concatenation of all args. If one wanted to remove enclosing double-quotes around EACH argument, use a new option such as -m:

Does this mean that with -Q only OUTER (first char and last char of the string) are double quotes would they be ignored? And if any quotes are WITHIN the string it would NOT ignore them? The rewording of the option suggests that to me along with the examples you provide below.

If so what about:

'"foo"' '"foo"'

?

$ jstrencode -h
...
    -Q    ignore double-quotes that surround the concatenation of all strings (def: encode all bytes)
    -m    ignore enclosing double-quotes for each string (def: encode all bytes)

So we would have:

$ jstrencode -m '"hi"' '"hello"'
hihello

And:

$ jstrencode -Q '"hi' 'hello"'
hihello

At a look with these examples and the example below it appears you've simply swapped the letters. Is that the case or have I missed something?

And one could combine both -Q and -m:

$ jstrencode -Q -m '""hi"' '"hello""'
hihello

I'm still unsure how this is meant to work so I will leave it at that and go take care of other things ... including things I did not anticipate. Will explain in the OT thread.

lcn2 commented 2 months ago

REPLY

A reply to the implied question of GH-issuecomment-2351100502 follows.

We are going to go thru a step by step build up of jstrencode(1) and jstrdecode(1) functionality.

Setup

We will assume that we are working with synced stuff, including:

All of which seem to be synced to the same jparse code as of the time of this comment.

We have done a make install from the mkiocccentry repo followed by a make install from the lcn2 forked jparse repo (which equals the official parse repo as of the time of this comment).

jstrencode and jstedecode tests

In this next example, we are asking how to encode as JSON string, the characters: "foo"

$ jstrencode '"foo"'
\"foo\"

The backslashed double quotes are expended because the default purpose of jstrencode(1) is to output what would go inside of a JSON string.

With jstrencode -Q we ask jstrencode(1) to ignore any enclosing double quotes and thus as asking encode as JSON string, the characters: foo

$ jstrencode -Q '"foo"'
foo

This is expended because the -Q, as the usage message says:

    -Q      do not encode enclosing double quotes (def: encode all bytes)

and the jstrencode(1) man page says:

       -Q     Do not encode enclosing double quotes (def: encode all bytes),

FIX NEEDED: change to jstrencode(1) man page

The punctuation at the end of OPTIONS section text is problematic: especially the trialing comma for -Q. We would eliminate all such punctuation and trying to turn each OPTIONS line into a complete sentence is sometimes a waste of effort, sometimes produces invalid sentences, and sometimes forces a simple explanatory string into a needlessly long sentence.

BTW: Worse still .. although not in this man page case, OPTIONS as sentence and produce confusing info where one has to ponder if the to ending punctuation is part of an example or NOT. Again in a different man page, here is a more extreme example of what can go wrong:

     -t FQDN    Target host we a fully qualified domain name: for example my-host.example.com.

Was the trailing "." part of the FQDN (as tools such as dig(1) might want), or trailing punctuation?

There is a GOOD REASON why jstrencode -h does NOT attempt to form complete sentences with trailing punctuation.

There are exceptions. When the text for an OPTION MUST contain multiple sentences, then the final sentence should be punctuation with care.

We recommend REMOVING trailing punctuation from OPTIONS section text, except where the OPTION text is more than one sentence.

And yes we recommend doing this to ALL man pages in this and the other related repos.

FIX APPLIED

The above "FIX NEEDED" has been address with PR #16 to the jstrencode(1) and jstrdecode(1) man pages.

Back to the jstrencode and jstedecode tests

When jstrencode(1) is given two or more arguments, the intent is to concatenate them and operate on the concatenated result as if it were a single argument. So this:

$ jstrencode foo bar bar
foobarbaz

produces the same thing as this:

$ jstrencode foobarbaz
foobarbaz

The purpose of allowing multiple arguments to jstrencode(1) is that each individual argument might be constructed thru different means. For example:

$ jstrencode $(some command) $(a different command)
...

Yes, one could build up the argument set, one at a time. For more complex situations this gets more awkward:

$ jstrencode $(find soup -type f ! -name '*.json' -print)
...

So what should happen with -Q with multiple arguments are given to jstrencode(1)? Well the -Q` in the help message shows:

    -Q      do not encode enclosing double quotes (def: encode all bytes)

Unfortunately, this result is NOT correct:

$ jstrencode -Q '"foo"' '"bar"' '"baz"'
foobarbaz

The concatenation of the 3 args is: "foo""bar"'"baz"

When, due to -Q, the enclosing double quotes are not encoded, this is what needs to be encoded: foo""bar"'"baz I.e., the enclosing double quotes are IGNORED and so the result should be the same as if the arguments had been concatenated:

$ jstrencode -Q '"foo""bar""baz"'
foo\"\"bar\"\"baz

Only the enclosing double quotes should be ignored and not encoded.

FIX NEEDED: jstrencode -Q with multiple args should only ignore the outer double quotes after arg concatenation

It would be a good idea to document the concatenation of multiple args in both the usage message as well as in the man page.

FIX APPLIED

The above "FIX NEEDED" has been address with PR #16 as suggest above.

Back to the jstrencode and jstedecode tests

What if one wants to ignore (i.e., not encode) double quotes that enclose each argument instead of the concatenated result?

We suggest adding a -e to jstrencode(1) such that:

    -e      do not encode double quotes that enclose each arg (def: do encode)

So that this would happen:

$ jstrencode -e '"foo""bar""baz"'
foobarbaz

What about using both -Q and -e with jstrencode(1)?

Well this should happen:

$ jstrencode -Q -e  '""foo"' '"bar"' '"baz""'
foobarbaz

FIX NEEDED: Add code for -m to jstrencode(1) as noted above

The -m flag should be added to jstrencode(1).

FIX APPLIED

The above "FIX NEEDED" has been address with PR #16 as suggest above.

This now works as expected:

$ echo -n '"bar"' | ./jstrencode -Q -e '""foo"' - '"baz""'
foobarbaz

Back to the jstrencode and jstedecode tests

Looking at the jstrdecode(1), we find:

    -Q      enclose output in double quotes (def: do not)
    -m      enclose each decoded string with escaped quotes (def: do not)
    -e      alias for -m

We don't recommend using both -m and -e for the same action. Had there been a prior use where we wanted to maintain compatibility, we would consider using both options.

FIX NEEDED: Use only -e and drop -m for jstrdecode(1)

We fix both the jstrdecode(1) man page and the code itself.

FIX APPLIED

The above "FIX NEEDED" has been address with PR #16 as suggest above.

Back to the jstrencode and jstedecode tests

The use case of multiple arguments for jstrdecode(1) is less clear. Nevertheless, @xexyl, the way you have programmed it is a reasonable use case.

We don't see any need, other than a minor adjustment to the man page and usage message, to change the code (other than to use only -e and not -m).

lcn2 commented 2 months ago

With PR #16 the man page has been fixed. See GH-issuecomment-2351382556.