xexyl / jparse

jparse - a JSON parser
2 stars 2 forks source link

Enhancement: jstrdecode should have a command line option to ignore newlines #21

Open lcn2 opened 4 hours ago

lcn2 commented 4 hours ago

Is there an existing issue for this?

Describe the enhancement

When jstedecode(1) processes data from stdin, it should have a command line option to ignore newlines.

Consider:

echo 'å∫ç∂' | jstrdecode

This will generate the following unhelpful error:

Warning: json_decode: found non-\-escaped char: 0x0a
Warning: jstrdecode_stream: error while decoding stdin buffer
Warning: main: failed to decode string from stdin

Whereas this works:

echo -n 'å∫ç∂' | jstrdecode

and produces:

å∫ç∂

The same problem happens when you use the - (dash character) arg:

echo 'å∫ç∂' | jstrdecode -

IMPORTANT NOTE: Objecting to a "naked" newline in a JSON string IS CORRECT BEHAVIOR as the so-called JSON spec does NOT allow for "naked" newlines in JSON strings.

Relevant images, screenshots or other files

echo Hello, World! | jstrdecode :-)

Relevant links

🔗⛓️

Anything else?

This is NOT a priority. The bin tools on that "other" repo supply a -T to trim outgoing newlines before feeding them into jstrdecode -. So there is no need for an emergency fix. We held off on making this change until after the code freeze.

"Technically and pedantically speaking" (as the expression goes), the current behavior of jstrdecode(1) is correct. The tool does "decode JSON encoded strings" and in the so-called JSON spec, newlines characters are NOT allowed. The JSON parser jparse(1) and its friendly library should NOT be relaxed to allow for naked newline characters in JSON documents.

Perhaps jstrdecode(1) should jstrdecode(1) have an option to ignore any final newline?

We observe that its companion jstrencode(1) tool has -n:

    -n      do not output newline after encode output (def: print final newline)

As does jstrdecode(1):

    -n      do not output newline after decode output

Perhaps a new option such as -N should tell jstrdecode to ignore a final newline, such that this would work:

echo 'å∫ç∂' | jstrdecode -N

Note that internal newlines would still cause this to fail:

echo -n 'å∫ç∂
´ƒ©˙' | jstrdecode -N

This is NOT a priory enhancement request.

xexyl commented 4 hours ago

I like this thought.

If I understand it correctly this would skip past the byte 0x0a when decoding, if it's the final byte? If so that is a good idea and would help with typing the commands. I don't think it would break JSON validity either since it's only for inputting commands, not JSON itself.

lcn2 commented 4 hours ago

With -N one might consider that when reading internal newlines from stdin, that they each be treated as a separate argument so that:

echo -n 'å∫ç∂' 'ƒ©˙' | jstrdecode

and:

echo 'å∫ç∂
ƒ©˙' | jstrdecode -N

would produce the same thing?

UPDATE 0

See also GH-issuecomment-2405771581 ᐤ.

UPDATE 1b

Regarding GH-issuecomment-2405775932), see Unicode control points such as U+1424 and U+2218.

Perhaps a short scan of a List of Unicode Symbols would find better examples.

Other candidates could be U+07CB, U+09F9, U+0A66, U+0C66, U+0F1A, U+20DD, U+25CB, U+25CE, U+25EF, u+26AC, U+2B55, U+2D54, and U+FFEE.

xexyl commented 4 hours ago

With -N one might consider that when reading internal newlines from stdin, that they each be treated as a separate argument so that:

echo -n 'å∫ç∂' 'ƒ©˙' | jstrdecode

and:

echo 'å∫ç∂
ƒ©˙' | jstrdecode -N

would produce the same thing?

I wonder. I was just doing it so it would skip the final newline which seems reasonable. But the question is if one uses it this way what does a final newline do? Meaning a trailing newline?

UPDATE 0

See also GH-issuecomment-2405771581 ᐤ.

I replied with a question on that.

xexyl commented 3 hours ago

I just implemented the way I interpreted it. I can commit. If you wish for more then let me know. I have to check one thing. After this I then must do other things. (Okay did it in decode only: have to look at encode also.)

lcn2 commented 3 hours ago

Regarding GH-issuecomment-2405774413, see the suggestion for processing internal newlines with -N as well.

The -N option would be limited to the internals of jstrdecode(1) only, and would not impact the JSON parser code.

We suggest that with -N, that jstrdecode(1) buffer stdin into memory, scan it removing newlines, before passing it on to the JSON string decoder.

lcn2 commented 3 hours ago

I just implemented the way I interpreted it. I can commit. If you wish for more then let me know. I have to check one thing. After this I then must do other things. (Okay did it in decode only: have to look at encode also.)

Please do commit, and feel free to add it to the "other repo".

xexyl commented 3 hours ago

I just implemented the way I interpreted it. I can commit. If you wish for more then let me know. I have to check one thing. After this I then must do other things. (Okay did it in decode only: have to look at encode also.)

Please do commit, and feel free to add it to the "other repo".

I'll do that and the other part can happen later I guess as I have another question about that way.

xexyl commented 3 hours ago

Regarding GH-issuecomment-2405774413, see the suggestion for processing internal newlines with -N as well.

The -N option would be limited to the internals of jstrdecode(1) only, and would not impact the JSON parser code.

We suggest that with -N, that jstrdecode(1) buffer stdin into memory, scan it removing newlines, before passing it on to the JSON string decoder.

Okay assuming I remove internal \n. What would it do? How would it be removed? If I changed to a NUL byte that could be a problem. Changing it to space might be wrong too.

I guess if it's the final byte it would be like I am doing now?

lcn2 commented 3 hours ago

Regarding GH-issuecomment-2405774413, see the suggestion for processing internal newlines with -N as well. The -N option would be limited to the internals of jstrdecode(1) only, and would not impact the JSON parser code. We suggest that with -N, that jstrdecode(1) buffer stdin into memory, scan it removing newlines, before passing it on to the JSON string decoder.

Okay assuming I remove internal \n. What would it do? How would it be removed? If I changed to a NUL byte that could be a problem. Changing it to space might be wrong too.

I guess if it's the final byte it would be like I am doing now?

We suggest you delete it from the buffer.

The command:

jstrdecode 'foo' 'bar'

concatenates both "foo" and "bar".

So think of:

(echo "foo"; echo "bar") | jstrdecode -N

as removing both newlines and concatenating the results.

Think of each line on stdin as a separate argument where -N trims the newlines on each line if they are found.

UPDATE 0

With -N, read stdin into memory, then copy to another memory block all bytes that are not a newline character, then free the 1st block of memory and decode the 2nd block with the newlines removed.

BTW: This should also work:

jstrdecode -N "hi
hello
there"

Just as if we did:

echo "hi
hello
there" | jstrdecode -N

and just as if we did:

jstrdecode hi hello there
xexyl commented 3 hours ago

About to do a pull request. If you wish to explain your further idea with -N please do so.

lcn2 commented 3 hours ago

About to do a pull request. If you wish to explain your further idea with -N please do so.

With -N, the jstrdecode command would ignore all newlines found.

    -N      ignore all newlines (def: treat newline characters as JSON string errors)
xexyl commented 3 hours ago

Okay I understand now. Not sure I can do this today but the final newline is at least removed now.

Perhaps you would like to do this quickly before the freeze over there? That would be appreciated if you have the time. Otherwise I can do it tomorrow I guess (or later today if I have time - but I doubt it).

lcn2 commented 2 hours ago

Okay I understand now. Not sure I can do this today but the final newline is at least removed now.

Perhaps you would like to do this quickly before the freeze over there? That would be appreciated if you have the time. Otherwise I can do it tomorrow I guess (or later today if I have time - but I doubt it).

We have a tested PR that we will issue to the jparse repo. Then we will fold that PR into this repo after we complete the existing PR processing to be sure we stay in sync.

xexyl commented 2 hours ago

If you think your updates resolves this issue then please feel free to close this, or let me know and I'll close it. If not then please let me know what else you think needs to be done.

xexyl commented 2 hours ago

Okay I understand now. Not sure I can do this today but the final newline is at least removed now. Perhaps you would like to do this quickly before the freeze over there? That would be appreciated if you have the time. Otherwise I can do it tomorrow I guess (or later today if I have time - but I doubt it).

We have a tested PR that we will issue to the jparse repo. Then we will fold that PR into this repo after we complete the existing PR processing to be sure we stay in sync.

Thanks for taking the time doing that! Really appreciate it.

lcn2 commented 2 hours ago

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

xexyl commented 1 hour ago

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

Sure. Please make sure to update the man page too. You can then sync to mkiocccentry when you're all done. I should be able to merge it today but if I don't get to it today I'll certainly do it in the morning (when I'm awake enough to look at it).

I guess that won't matter for a code freeze over in the mkiocccentry repo. Even so I'll be awake for many more hours just not as many as you.

lcn2 commented 1 hour ago

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

Sure. Please make sure to update the man page too. You can then sync to mkiocccentry when you're all done. I should be able to merge it today but if I don't get to it today I'll certainly do it in the morning (when I'm awake enough to look at it).

I guess that won't matter for a code freeze over in the mkiocccentry repo. Even so I'll be awake for many more hours just not as many as you.

PR is own the way soon, after man page edits.

xexyl commented 1 hour ago

I see that you added -N to jstrencode. OK. Well it seems that a similar PR will have to be added for that command as well. Stay tuned.

Sure. Please make sure to update the man page too. You can then sync to mkiocccentry when you're all done. I should be able to merge it today but if I don't get to it today I'll certainly do it in the morning (when I'm awake enough to look at it). I guess that won't matter for a code freeze over in the mkiocccentry repo. Even so I'll be awake for many more hours just not as many as you.

PR is own the way soon, after man page edits.

Sounds good. Thanks!

xexyl commented 1 hour ago

Feel free to do the update you were thinking of with -t too. That'd be great as I'm not sure what code point you're referring to and it would also be fun to see what you have in mind.

lcn2 commented 1 hour ago

Double free bug .. need to fix.

xexyl commented 1 hour ago

Double free bug .. need to fix.

Forgot to set to NULL? Or did I do something when I did the original addition very quickly ? Well anyway no problem. I will be going afk shortly but I'll have my phone with me.

lcn2 commented 1 hour ago

The jstrencode_stream() function needs fixing ...

xexyl commented 1 hour ago

The jstrencode_stream() function needs fixing ...

I hope you're a strong swimmer!

lcn2 commented 1 hour ago

The jstrencode_stream() function needs fixing ...

I hope you're a strong swimmer!

Fixed.

lcn2 commented 1 hour ago

Doing the man page change now.

xexyl commented 1 hour ago

The jstrencode_stream() function needs fixing ...

I hope you're a strong swimmer!

Fixed.

Great.

xexyl commented 1 hour ago

Doing the man page change now.

Thanks!

lcn2 commented 1 hour ago

Done. Just a bit of hand testing now ...

lcn2 commented 1 hour ago

building the commit ...

xexyl commented 1 hour ago

Sounds good.

lcn2 commented 1 hour ago

creating the PR ...

Watching the GitHub test ...

debug[3]: \ud83d\ude4f\uD83D\uDD25\uD83D\uDC09: 🙏🔥🐉 == 🙏🔥🐉: true
debug[3]: \uD83D\uDC8D\uD83C\uDF0B: 💍🌋 == 💍🌋: true

🤓

lcn2 commented 1 hour ago

See PR #23