johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
8.91k stars 214 forks source link

Attempting to parse DKVP with `--irs $'\n\n'` results in no output #1661

Closed zebernst closed 1 week ago

zebernst commented 1 week ago

I'm attempting to use Miller to parse the output of ldapsearch - my query has a very similar output to #241, so I tried to use the flags mentioned in that issue as a starting point. Specifying --ifs '$\n' and --ips ': ' work as expected, but using two newlines as the IRS seems to break Miller (even in absence of any other flags) - mlr simply doesn't print any output, and I can't figure out how to debug this further.

This is what my input data looks like:

dn: CN=Bar\, Foo,OU=Some_Value,OU=OtherValue,DC=example,DC=com
sn: Bar
givenName: Foo
sAMAccountName: fooBar

dn: CN=Baz\, Qux,OU=Some_Value,OU=OtherValue,DC=example,DC=com
sn: Baz
givenName: Qux
sAMAccountName: quxBaz

I've tried specifying the double quotes as '\n\n', $'\n\n', "\n\n", as well as --irs lflf and had no luck with any of them.

I also ran my input through mlr lecat, which produced the following output (i.e. there aren't any unexpected CRs or anything in there):

dn: CN=Bar\, Foo,OU=Some_Value,OU=OtherValue,DC=example,DC=com[LF]
sn: Bar[LF]
givenName: Foo[LF]
sAMAccountName: fooBar[LF]
[LF]
dn: CN=Baz\, Qux,OU=Some_Value,OU=OtherValue,DC=example,DC=com[LF]
sn: Baz[LF]
givenName: Qux[LF]
sAMAccountName: quxBaz[LF]
[LF]

Finally, just to see if I'm going insane, I tried running the example data given in #241 through the query mentioned in that issue (mlr --idkvp --irs $'\n\n' --ifs $'\n' --ips ': ' --ojson cat) and it also produced an empty output.

Is it possible that there was a regression with supporting --irs with DKVP data? Or am I missing something obvious here?

aborruso commented 1 week ago

I don't know if I understood your goal

Your input is similar to XTAB Miller format: https://miller.readthedocs.io/en/latest/file-formats/#xtab-vertical-tabular

Using your sample input

dn: CN=Bar\, Foo,OU=Some_Value,OU=OtherValue,DC=example,DC=com
sn: Bar
givenName: Foo
sAMAccountName: fooBar

dn: CN=Baz\, Qux,OU=Some_Value,OU=OtherValue,DC=example,DC=com
sn: Baz
givenName: Qux
sAMAccountName: quxBaz

I can manage it in mlr in example in this way. Running mlr --ips ": " --x2c cat input.txt

I get this CSV

dn,sn,givenName,sAMAccountName
"CN=Bar\, Foo,OU=Some_Value,OU=OtherValue,DC=example,DC=com",Bar,Foo,fooBar
"CN=Baz\, Qux,OU=Some_Value,OU=OtherValue,DC=example,DC=com",Baz,Qux,quxBaz

I don't know if I've been helpful

johnkerl commented 1 week ago

+1 with @aborruso , this sounds like a job for XTAB:

$ cat 1661.dat
dn: CN=Bar\, Foo,OU=Some_Value,OU=OtherValue,DC=example,DC=com
sn: Bar
givenName: Foo
sAMAccountName: fooBar

dn: CN=Baz\, Qux,OU=Some_Value,OU=OtherValue,DC=example,DC=com
sn: Baz
givenName: Qux
sAMAccountName: quxBaz

$ mlr --ixtab --ojson --ips-regex ': *' cat 1661.dat
[
{
  "dn": "CN=Bar\\, Foo,OU=Some_Value,OU=OtherValue,DC=example,DC=com",
  "sn": "Bar",
  "givenName": "Foo",
  "sAMAccountName": "fooBar"
},
{
  "dn": "CN=Baz\\, Qux,OU=Some_Value,OU=OtherValue,DC=example,DC=com",
  "sn": "Baz",
  "givenName": "Qux",
  "sAMAccountName": "quxBaz"
}
]```
zebernst commented 1 week ago

Can confirm that XTAB works as expected for reading that data! Thank you both - I didn't realize that this was the type of data that XTAB was designed for!