jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.51k stars 1.58k forks source link

Allow setting raw input delimiter #965

Open phiresky opened 9 years ago

phiresky commented 9 years ago

As far as I can tell, this is not currently possible?

main use case: find | jq -R works

but because filenames can contain newlines that is not safe, so I'd like to use find -print0, but jq does not allow setting \0 as the input delimiter (or setting it at all).

It can be circumvented with

find -print0|jq --slurp --raw-input 'split("\u0000")[]'

but that disables streaming the input

Usage in other programs (for \0):

nicowilliams commented 9 years ago

I agree.

wtlangford commented 9 years ago

I have a concern. If we enable setting the delimiter, do we automatically convert newline characters to \n in that mode? Otherwise we end up with invalid json strings. I feel this may be an example of an input that should be processed with sed or something before feeding into jq.

On Mon, Sep 28, 2015 at 2:27 PM Nico Williams notifications@github.com wrote:

I agree.

— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/965#issuecomment-143835020.

phiresky commented 9 years ago

Yes, probably, like slurp

I feel this may be an example of an input that should be processed with sed or something before feeding into jq.

But how would that work? Without stopping streaming?

nicowilliams commented 9 years ago

@wtlangford Strings can contain newlines. Newlines in strings have to be escaped in encoded JSON texts, but here we're not dealing with JSON texts, as the input is raw, and the output of the "parser" is a jv string to feed to the jq VM.

wtlangford commented 9 years ago

Fair enough. I'm convinced.

On Mon, Sep 28, 2015, 15:27 Nico Williams notifications@github.com wrote:

@wtlangford https://github.com/wtlangford Strings can contain newlines. Newlines in strings have to be escaped in encoded JSON texts, but here we're not dealing with JSON texts, as the input is raw, and the output of the "parser" is a jv string to feed to the jq VM.

— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/965#issuecomment-143850502.

nkgm commented 6 years ago

Having the same problem processing zsh history files, which use newlines between records, but may contain escaped newlines within records. I got sed to insert NULs to disambiguate records and then I bumped into this issue. Eventually had to do this backwards, getting sed to replace escaped newlines with NULs and keep newline as record separator in order to keep jq happy. The workaround was easy enough, but it would be really nice if jq would support NUL delimiter as per @phiresky's original comment.

nicowilliams commented 5 years ago

We should add a -0 at least, and preferably also a -F CHAR or some appropriately-named long option.

pabs3 commented 3 years ago

The -0 option got added already, personally I think that is enough and this issue can be closed now.

BTW, as pointed out in #1271, JSON strings can contain both LF ("\n") and NUL ("\u0000") so -0 is not sufficient for preventing recipients from getting the wrong amount of result strings (as is -r of course).

Freed-Wu commented 3 weeks ago

Comes from https://github.com/wader/fq/issues/1019

I also expect jq can be an alternative for perl/sed/awk. fq have imported --raw-output0 to set output seperator. a -0 or --raw-input0 can be good.