Closed muescha closed 7 months ago
maybe also with input json?
echo "/abc/def:/efg/ghi fgh/mn" | jq -R 'split(":")'
[
"/abc/def",
"/efg/ghi fgh/mn"
]
forget the json input - i can convert it easy into a multiline input with jq
:
echo "/abc/def:/efg/ghi fgh/mn" | jq -R 'split(":")[]'
"/abc/def"
"/efg/ghi fgh/mn"
or without quotes:
echo "/abc/def:/efg/ghi fgh/mn" | jq -R 'split(":")[]' -r
/abc/def
/efg/ghi fgh/mn
I think this is good idea. The easiest way to implement this in jc
would actually be to create additional mult-line parsers for each of those (e.g. url-multi
). These parsers would just iterate over the lines and call their parent parsers.
It's a little more difficult to create additional arguments to send to the parsers, unless they are ENV variables, because the parse()
function is pretty static and only takes 3 arguments for standard parsers and 4 arguments for streaming parsers.
and I think to introduces an 4th/5th argument for an generic options dictonary would be an overkill?
but I think this can be done in the cli like the slice ( #341 )?
I have thought about that but hadn't had much pressure for more arguments so I haven't spent much time on it. I think it might be interesting to have a kwargs
type of argument you can pass to the parsers so they can have their own specific arguments.
It can be done - I just need to make sure it doesn't break anything from a backward compatibility standpoint or with how jc
is used as a library (e.g. Ansible)
Ah yes, maybe since this is just iterating over a parser I could set up a jc
argument that doesn't need to be passed to the parser and just iterates over it and puts the values into an array. I got a similar request for the proc
parser so it could iterate over multiple files when a glob is used in the magic syntax (https://github.com/kellyjonbrazil/jc/issues/389)
Yes, the multiline feature could function as a preprocessor.
Perhaps some processor could possess a boolean attribute, such as 'multiline=true', which could be examined to determine if this option has been provided. This check could help the parser assess the feasibility of incorporating this feature.
I have a working version in the dev
branch. I have called this option --slurp
or -s
.
https://github.com/kellyjonbrazil/jc/tree/55bc91a6e43b32b0268f264b783deaf3271573eb
Here is an example with a list of URLs (one per line)
% cat urls.txt | jc --slurp --url -p
[
{
"url": "http://www.google.com",
"scheme": "http",
"netloc": "www.google.com",
"path": null,
"parent": null,
"filename": null,
"stem": null,
"extension": null,
"path_list": null,
"query": null,
"query_obj": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "www.google.com",
"port": null,
"encoded": {
"url": "http://www.google.com",
"scheme": "http",
"netloc": "www.google.com",
"path": null,
"parent": null,
"filename": null,
"stem": null,
"extension": null,
"path_list": null,
"query": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "www.google.com",
"port": null
},
"decoded": {
"url": "http://www.google.com",
"scheme": "http",
"netloc": "www.google.com",
"path": null,
"parent": null,
"filename": null,
"stem": null,
"extension": null,
"path_list": null,
"query": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "www.google.com",
"port": null
}
},
{
"url": "https://www.kelly.com/testing",
"scheme": "https",
"netloc": "www.kelly.com",
"path": "/testing",
"parent": "/",
"filename": "testing",
"stem": "testing",
"extension": null,
"path_list": [
"testing"
],
"query": null,
"query_obj": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "www.kelly.com",
"port": null,
"encoded": {
"url": "https://www.kelly.com/testing",
"scheme": "https",
"netloc": "www.kelly.com",
"path": "/testing",
"parent": "/",
"filename": "testing",
"stem": "testing",
"extension": null,
"path_list": [
"testing"
],
"query": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "www.kelly.com",
"port": null
},
"decoded": {
"url": "https://www.kelly.com/testing",
"scheme": "https",
"netloc": "www.kelly.com",
"path": "/testing",
"parent": "/",
"filename": "testing",
"stem": "testing",
"extension": null,
"path_list": [
"testing"
],
"query": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "www.kelly.com",
"port": null
}
},
{
"url": "https://mail.apple.com",
"scheme": "https",
"netloc": "mail.apple.com",
"path": null,
"parent": null,
"filename": null,
"stem": null,
"extension": null,
"path_list": null,
"query": null,
"query_obj": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "mail.apple.com",
"port": null,
"encoded": {
"url": "https://mail.apple.com",
"scheme": "https",
"netloc": "mail.apple.com",
"path": null,
"parent": null,
"filename": null,
"stem": null,
"extension": null,
"path_list": null,
"query": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "mail.apple.com",
"port": null
},
"decoded": {
"url": "https://mail.apple.com",
"scheme": "https",
"netloc": "mail.apple.com",
"path": null,
"parent": null,
"filename": null,
"stem": null,
"extension": null,
"path_list": null,
"query": null,
"fragment": null,
"username": null,
"password": null,
"hostname": "mail.apple.com",
"port": null
}
}
]
The documentation has been updated to show which parsers are compatible. Compatible parsers accept a single line of input. They are identified with the "slurpable
" tag:
% jc -a | jq '.parsers[] | select(.name == "url")'
{
"name": "url",
"argument": "--url",
"version": "1.2",
"description": "URL string parser",
"author": "Kelly Brazil",
"author_email": "kellyjonbrazil@gmail.com",
"compatible": [
"linux",
"darwin",
"cygwin",
"win32",
"aix",
"freebsd"
],
"tags": [
"standard",
"string",
"slurpable"
]
}
These can also be found with jc -hhh
:
% jc -hhh
Generic Parsers: (5)
--asciitable ASCII and Unicode table parser
--asciitable-m multi-line ASCII and Unicode table parser
--kv Key/Value file and string parser
<snip>
Slurpable Parsers: (9)
--date `date` command parser
--datetime-iso ISO 8601 Datetime string parser
--email-address Email Address string parser
--ip-address IPv4 and IPv6 Address string parser
--jwt JWT string parser
--semver Semantic Version string parser
--timestamp Unix Epoch Timestamp string parser
--url URL string parser
--ver Version string parser
Streaming Parsers: (15)
--cef-s CEF string streaming parser
--clf-s Common and Combined Log Format file streaming parser
--csv-s CSV file streaming parser
<snip>
slurp is working fine :)
echo "/abc/def:/efg/ghi" | tr ":" "\n" | jc --url -s | jq '[.[] | {path, path_list}]'
[
{
"path": "/abc/def",
"path_list": [
"abc",
"def"
]
},
{
"path": "/efg/ghi",
"path_list": [
"efg",
"ghi"
]
}
]
I'm rethinking the slurp output and it might make sense to use a dictionary for both types of slurp (multiple lines to a slurpable parser and multiple /proc
files with magic syntax):
{<identifier>: <parsed-output>}
The <identifier>
is the input string when slurping string lines.
The <identifier>
is the filename when slurping multiple files from /proc
magic syntax.
This makes the output more consistent and also ensures you can identify which input corresponds to which output. You can still reference the nth output if you don't care about the key name in jq
by using the keys_unsorted[n]
syntax. For example, to grab the 3rd object without caring about the key name:
% cat uname.txt | jc --slurp --uname | jq '.[keys_unsorted[2]]'
{
"machine": "x86_66",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
}
This is what the output looks like without filtering:
Single-line slurpable parsers:
% cat uname.txt | jc --slurp --uname -p
{
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_64": {
"machine": "x86_64",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_65": {
"machine": "x86_65",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_66": {
"machine": "x86_66",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_67": {
"machine": "x86_67",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_68": {
"machine": "x86_68",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
}
}
Multiple /proc
files:
% jc -p /proc/stat /proc/cpuinfo
{
"/proc/stat": {
"cpu": {
"user": 6002,
"nice": 152,
"system": 8398,
"idle": 3444436,
"iowait": 448,
"irq": 0,
"softirq": 1174,
"steal": 0,
"guest": 0,
"guest_nice": 0
},
"cpu0": {
"user": 2784,
"nice": 137,
"system": 4367,
"idle": 1732802,
"iowait": 225,
"irq": 0,
"softirq": 221,
"steal": 0,
"guest": 0,
"guest_nice": 0
},
"cpu1": {
"user": 3218,
"nice": 15,
"system": 4031,
"idle": 1711634,
"iowait": 223,
"irq": 0,
"softirq": 953,
"steal": 0,
"guest": 0,
"guest_nice": 0
},
"interrupts": [
2496709,
<snip>
0
],
"context_switches": 4622716,
"boot_time": 1662154781,
"processes": 9831,
"processes_running": 1,
"processes_blocked": 0,
"softirq": [
3478985,
35230,
1252057,
3467,
128583,
51014,
0,
171199,
1241297,
0,
596138
]
},
"/proc/cpuinfo": [
{
"processor": 0,
"vendor_id": "GenuineIntel",
"cpu family": 6,
"model": 142,
"model name": "Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz",
"stepping": 9,
"cpu MHz": 2303.998,
"cache size": "4096 KB",
"physical id": 0,
"siblings": 1,
"core id": 0,
"cpu cores": 1,
"apicid": 0,
"initial apicid": 0,
"fpu": true,
"fpu_exception": true,
"cpuid level": 22,
"wp": true,
"flags": [
"fpu",
"vme",
"de",
"pse",
"tsc",
"msr",
"pae",
"mce",
"cx8",
"apic",
"sep",
"mtrr",
"pge",
"mca",
"cmov",
"pat",
"pse36",
"clflush",
"mmx",
"fxsr",
"sse",
"sse2",
"ht",
"syscall",
"nx",
"rdtscp",
"lm",
"constant_tsc",
"rep_good",
"nopl",
"xtopology",
"nonstop_tsc",
"eagerfpu",
"pni",
"pclmulqdq",
"monitor",
"ssse3",
"cx16",
"pcid",
"sse4_1",
"sse4_2",
"x2apic",
"movbe",
"popcnt",
"aes",
"xsave",
"avx",
"rdrand",
"hypervisor",
"lahf_lm",
"abm",
"3dnowprefetch",
"fsgsbase",
"avx2",
"invpcid",
"rdseed",
"clflushopt",
"md_clear",
"flush_l1d"
],
"bogomips": 4607.99,
"clflush size": 64,
"cache_alignment": 64,
"address sizes": "39 bits physical, 48 bits virtual",
"power management": null,
"address_size_physical": 39,
"address_size_virtual": 48,
"cache_size_num": 4096,
"cache_size_unit": "KB"
}
]
}
Potential issue: duplicate input values get deduplicated, so if you are not expecting that and you are just iterating by number, you could be looking at the wrong value. 😦 Potential workaround: ensure your input is already deduplicated via the uniq
command or similar.
somehow I don't like this case "The
I also fear that the order can be changed when it is an dict, and not a list.
I would like to have the "current" behaviour for normal slurp.
maybe it can be done with an additional option --key
or --dict
(maybe with options linenumbers
or filename
/cmd
which means the command string, when it is an magic command)
so to get the /proc output:
jc --dict-cmd -p /proc/stat /proc/cpuinfo
cat uname.txt | jc --dict-linenumber --slurp --uname -p
cat uname.txt | jc --dict-input --slurp --uname -p
jc --dict-cmd -p ls
{
"ls": [
{
"filename": "common.jar"
},
{
"filename": "rider.jar"
},
]
}
jc --dict-cmd -p ls
{
"ls": [
{
"filename": "common.jar"
},
{
"filename": "rider.jar"
}
]
}
jc --key-cmd -p ls -al
{
"ls -al": [
{
"filename": ".",
"flags": "drwxr-xr-x@",
"links": 13,
"owner": "muescha",
"group": "staff",
"size": 416,
"date": "Jan 18 17:26"
},
{
"filename": "..",
"flags": "drwxr-xr-x@",
"links": 3,
"owner": "muescha",
"group": "staff",
"size": 96,
"date": "Jan 18 17:26"
},
{
"filename": "common.jar",
"flags": "-rw-r--r--@",
"links": 1,
"owner": "muescha",
"group": "staff",
"size": 24430405,
"date": "Oct 20 12:55"
},
{
"filename": "rider.jar",
"flags": "-rw-r--r--@",
"links": 1,
"owner": "muescha",
"group": "staff",
"size": 9987,
"date": "Oct 20 12:55"
}
]
}
Yeah, I agree there are some issues with this method. I'm looking into using the original list output but maybe use the --meta-out
option to add source information.
What about something like this?
Basically just wrapping in a dict and adding the slurped
key that contains the data so that a single _jc_meta
object can be attached with the --meta-out
option that includes the original list of inputs?
% cat uname.txt | jc --slurp --uname -p --meta-out
{
"slurped": [
{
"machine": "x86_64",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
{
"machine": "x86_65",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
{
"machine": "x86_66",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
{
"machine": "x86_67",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
},
{
"machine": "x86_68",
"kernel_name": "Darwin",
"node_name": "Kellys-MBP.attlocal.net",
"kernel_release": "22.6.0",
"kernel_version": "Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64"
}
],
"_jc_meta": {
"parser": "uname",
"timestamp": 1705953071.42138,
"slice_start": null,
"slice_end": null,
"input_list": [
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_64",
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_65",
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_66",
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_67",
"Darwin Kellys-MBP.attlocal.net 22.6.0 Darwin Kernel Version 22.6.0: Wed Oct 4 21:25:26 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_X86_64 x86_68"
]
}
}
so the slurped
key becomes only visible with the --meta-out
? then it will be ok :)
Ok, not sure if this will please everyone but finalized on a slurp
output that hopefully meets everyone's requirements. https://github.com/kellyjonbrazil/jc/commit/6a7f38388359fd2ac05b1b0aa639a94080e665e5
I went back to the original idea of slurping the data into a list. If the data is coming from /proc
magic syntax, a _file
field will be added to the data.
In addition, if --meta-out
is used, then the data is further wrapped in a dictionary that looks like:
{
"result": [<output_data>],
"_jc_meta": {
"parser": "url",
"timestamp": 1706235558.654576,
"slice_start": null,
"slice_end": null,
"input_list": [
"http://www.google.com",
"https://www.kelly.com/testing",
"https://mail.apple.com"
]
}
}
input_list
contains a list of inputs (actual input strings or /proc
filenames) so you can identify which item is which. This keeps everything in order and also works with duplicate entries.
Added in v1.25.0
Hey @kellyjonbrazil, could you clarify the following please -
Will slurp/multi-line help in adding support in jc
for message like these (slides 9 to 11) -
http://dtsc.dfw.ibm.com/MVSDS/'HTTPD2.APPS.ZOSCLASS.PDF(Z05)'
The messages are structured such that the first char indicates if it's a multi-line or not reference.
@v1gnesh I don't seem to have access to the first link. The second link seems to rererence syslog messages. There are already some syslog parsers in jc
:
These all either wrap multiple syslog messages in an array or output JSON Lines in a streaming fashion. The slurp functionality is more for parsers that only expect a single line or string, like an IP address for the --ip-address
parser. Since the syslog parsers already expect multiple lines, they don't need the slurp functionality.
@kellyjonbrazil First link - ah, that's probably a HTTPS redirect doing it. That link works with HTTP only. The syslog I've linked to is pretty exotic - from a (IBM Z) mainframe operating system called z/OS.
@v1gnesh I was able to open the presentation (had to add in the single quote at the end as it was being left off). The Slurp functionality won't have any affect on this type of data. It looks like a custom parser would need to be created for these types of syslog messages and the parser would need to automatically account for multiline messages in the parsing logic.
For some use cases I see there only one line are as parameter, for example for the
--url
.It would be nice if is possible to add an option for example
--multiline-input
to parse it by line and join the output into one json.example - we have this data:
this would be nice to have as an input as example for
--url
current behaviour:
expected behaviour:
PS: the long version would be:
Possible candidates:
(all with tag
string
, but not withfile
andgeneric
)--ver
--email
--date
--datetime-iso
--url
--ip-address
--semver
--timestamp
--jwt