johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
9.02k stars 217 forks source link

`mlr cut` does not appear to handle nested JSON #1062

Open jauderho opened 2 years ago

jauderho commented 2 years ago

I have a JSON file with the data below and I'm hoping to extract the value of domain.domain using the command thusly

~/go/bin/mlr --ijson --ocsv cut -f domain.domain *

It would appear that I'm able to extract domain using

~/go/bin/mlr --ijson --ocsv cut -f domain *

I looked through the documentation but most of the documentation seems geared towards manipulating CSV and not JSON. I was unable to find a fragment that could point me in the right direction.

data file

{
    "domain": {
        "id": "2138514_DOMAIN_COM-VRSN",
        "domain": "google.com",
        "punycode": "google.com",
        "name": "google",
        "extension": "com",
        "whois_server": "whois.markmonitor.com",
        "status": [
            "clientdeleteprohibited",
            "clienttransferprohibited",
            "clientupdateprohibited",
            "serverdeleteprohibited",
            "servertransferprohibited",
            "serverupdateprohibited"
        ],
        "name_servers": [
            "ns1.google.com",
            "ns2.google.com",
            "ns3.google.com",
            "ns4.google.com"
        ],
        "created_date": "1997-09-15T04:00:00Z",
        "created_date_in_time": "1997-09-15T04:00:00Z",
        "updated_date": "2019-09-09T15:39:04Z",
        "updated_date_in_time": "2019-09-09T15:39:04Z",
        "expiration_date": "2028-09-14T04:00:00Z",
        "expiration_date_in_time": "2028-09-14T04:00:00Z"
    },
    "registrar": {
        "id": "292",
        "name": "MarkMonitor Inc.",
        "phone": "+1.2086851750",
        "email": "abusecomplaints@markmonitor.com",
        "referral_url": "http://www.markmonitor.com"
    },
    "registrant": {
        "organization": "Google LLC",
        "province": "CA",
        "country": "US",
        "email": "select request email form at https://domains.markmonitor.com/whois/google.com"
    },
    "administrative": {
        "organization": "Google LLC",
        "province": "CA",
        "country": "US",
        "email": "select request email form at https://domains.markmonitor.com/whois/google.com"
    },
    "technical": {
        "organization": "Google LLC",
        "province": "CA",
        "country": "US",
        "email": "select request email form at https://domains.markmonitor.com/whois/google.com"
    }
}
aborruso commented 2 years ago

Hi, if you run

mlr --j2n flatten then cut -f "domain.domain" input.json

you will have google.com.

A related discussion https://github.com/johnkerl/miller/discussions/1013#discussioncomment-2671223

jauderho commented 2 years ago

@aborruso

Thanks. This appears to do what I need now.

~/go/bin/mlr --ijsonl flatten google.com | ~/go/bin/mlr --ocsv cut -o -f "domain.domain,domain.id"

aborruso commented 2 years ago

@jauderho if it's ok for you, please close the issue ;)

aborruso commented 2 years ago

~/go/bin/mlr --ijsonl flatten google.com | ~/go/bin/mlr --ocsv cut -o -f "domain.domain,domain.id"

It should work also

mlr --ijsonl --ocsv flatten then cut -o -f "domain.domain,domain.id" google.com  
johnkerl commented 2 years ago

Hi @jauderho -- there's more context on https://github.com/johnkerl/miller/issues/737 as well.

I looked through the documentation but most of the documentation seems geared towards manipulating CSV and not JSON

Doubly correct -- at the 'demand' level it turns out CSV is far and above the most popular use-case for Miller. But even before that (at the 'supply' level) I initially handled only 'tabular JSON' in Miller (e.g. the 3.4.0 release).

In Miller 6 there is significantly improved support for JSON. However, there is farther yet to go.

Namely, the domain.id syntax is supported in the Miller DSL but is not supported by verbs like mlr cut. So, @aborruso 's advice about using flatten is spot-on as a workaround. But ideally you wouldn't need to use that workaround; verbs should be able to parse domain.id not as a 9-character column name but rather as domain, ., id.

I thought I had an open issue to point you to but I don't see one. I'll make one though!! :) This is a frequently requested feature.