Closed aborruso closed 2 years ago
@aborruso this is one of those things which I fear will confuse people about Miller 6 ... yet I do (I think ... more below) want this behavior. Namely, by default, the field name internally to Miller is b
(not b.1
and b.2
) -- because b
is an array-valued field. You can see this with
$ mlr -j cat input.json
{
"a": 3,
"b": ["a", "c"]
}
The flattening to b.1
and b.2
happens at the very end of verb-processing when the fields are "flattened" down since CSV is the output format, and CSV doesn't support nested values.
More info is on flatten/unflatten at https://miller.readthedocs.io/en/latest/flatten-unflatten
If you want to remove field names with .
in them you would need to flatten before the cut, so
$ mlr --j2c flatten then cut -rf '\.' input.json
b.1,b.2
a,c
$ mlr --j2c flatten then cut -rxf '\.' input.json
a
3
The reason I think I want this behavior by default is I think it's the most natural way to go back and forth between JSON & CSV.
But I may be wrong.
Auto-flatten/auto-unflatten is (currently) the default behavior. Without that we would have
$ mlr --j2c --no-auto-flatten cat input.json
a,b
3,"[""a"", ""c""]"
which is (to me anyway) less pleasant.
But that's only my own viewpoint & I'm eager to hear other points of view on what default behavior for flatten/unflatten should be ...
Hi @johnkerl thank you very much. It's very clear and it makes a lot of sense; reading the guide I did not understand it, my fault.
I often get a little crazy trying to flatten file as the one below. I think with Miller 6 it will be even easier than with version 5.
[
{
"city": "Palermo",
"id": 1,
"country": "Italy",
"population": 5000000,
"monuments": [
{
"name": "Church of St. Peter",
"type": "church",
"address": "Viale della Chiesa, 1"
},
{
"name": "Church of St. Mary",
"type": "church",
"address": "Viale del Tramonto, 5"
}
]
},
{
"city": "Milan",
"id": 2,
"country": "Italy",
"population": 6000000,
"monuments": [
{
"name": "Hello world",
"type": "tower",
"address": "Via Roma, 24"
},
{
"name": "Church of St. Paul",
"type": "church",
"address": "Via da qui, 5"
}
]
}
]
My stupid goal (stupid because I think it's immediate for you), is to produce using only Miller these two outputs:
mlrgo --j2m flatten then cut -x -r -f "\." 2flatten.json
city | id | country | population |
---|---|---|---|
Palermo | 1 | Italy | 5000000 |
Milan | 2 | Italy | 6000000 |
and
mlrgo --j2c flatten then cut -r -f "(id|\.)" then reshape -r "\." -o k,v then put -S '$idMonument=sub($k,"^(.+?)\.([0-9]+)\.(.+)$","\2");$k=sub($k,"^.+[0-9]?\.","")' then reshape -s k,v input.json
id | idMonument | name | type | address |
---|---|---|---|---|
1 | 1 | Church of St. Peter | church | Viale della Chiesa, 1 |
1 | 2 | Church of St. Mary | church | Viale del Tramonto, 5 |
2 | 1 | Hello world | tower | Via Roma, 24 |
2 | 2 | Church of St. Paul | church | Via da qui, 5 |
Every time I get into it, I think there is some easier way to do it using Miller, but I can't find it.
Forgive me for this long off-topic message.
Your answer is clear, I have to use flatten
.
@aborruso great!
Forgive me for this long off-topic message.
I think it's quite on-topic :) -- if it's confusing/hard for you, it will certainly be so for others.
Your answer is clear, I have to use flatten.
Flattening is certainly an issue, but not the only option (not one you have to use) ... there may be an easier way to get what you want ...
I think what you want to do with cutting out field names that have a .
in them is to find array-valued and map-valued fields -- ? So perhaps
mlr --j2m --from input.json put -q '
for (monument in $monuments) {
outrec = {
"id":$id,
"name": monument.name,
"type": monument.type,
"address": monument.address
};
emit outrec;
}
'
or perhaps
mlr --j2m --from input.json put -q '
for (monument in $monuments) {
outrec = {"id":$id};
outrec = mapsum(outrec, monument);
emit outrec
}
'
id | name | type | address |
---|---|---|---|
1 | Church of St. Peter | church | Viale della Chiesa, 1 |
1 | Church of St. Mary | church | Viale del Tramonto, 5 |
2 | Hello world | tower | Via Roma, 24 |
2 | Church of St. Paul | church | Via da qui, 5 |
along with
$ mlr --j2m cut -xf monuments 2flatten.json
city | id | country | population |
---|---|---|---|
Palermo | 1 | Italy | 5000000 |
Milan | 2 | Italy | 6000000 |
(Also I can see how https://github.com/johnkerl/miller/issues/546 would help here!)
Does this help?
Even better would be
mlr --j2m --from input.json put -q '
for (monument in $monuments) {
emit1 mapsum({"id": $id}, monument);
}
'
-- this is one of the few remaining blockers for Miller 6, letting the mapexcept(...)
(or any other expression evaluating to a map, which isn't already a map) be directly emitted.
@johnkerl you are always very kind!!
It seems great and you made my day!
Now I must study and try. Thank you
@aborruso https://github.com/johnkerl/miller/pull/739
So now this works:
mlr --j2m --from input.json put -q '
for (monument in $monuments) {
emit1 mapsum({"id": $id}, monument);
}
'
So now this works:
👏👏👏Do you hear this noise? I'm standing here in my room applauding you!!
It's really great. I'm not closing it, because I would like to continue this thread, starting with a more complex input file, and understand if I'm able to write the correct syntax.
Thank you!!!
Dear John, my real Json file has the below kind of structure.
My first goal could be extract something like
city id | monument name | monument type | monument address |
---|---|---|---|
1 | Church of St. Peter | church | Viale della Chiesa, 1 |
... | ... | ... | ... |
This JSON is more complex because monuments
is not always an array.
If I run
mlr --j2m --from input.json put -q '
for (city in $country.cities) {
emit1 mapsum({"id": city.id}, city.monuments);
}
'
I have only one monument, the one for the city in which monuments
is not an array:
id | name | type | address |
---|---|---|---|
3 | Lorem ipsum | Building | Via Chocchi, 3 |
{
"country": {
"name": "Italy",
"cities": [
{
"name": "Palermo",
"id": 1,
"population": 5000000,
"monuments": [
{
"name": "Church of St. Peter",
"type": "church",
"address": "Viale della Chiesa, 1"
},
{
"name": "Church of St. Mary",
"type": "church",
"address": "Viale del Tramonto, 5",
"websites": [
{
"name": "A great website",
"type": "Official website",
"url": "https://www.stmary.it"
},
{
"name": "stmary",
"type": "Twitter",
"url": "https://twitter.com/stmary"
}
]
}
]
},
{
"name": "Milan",
"id": 2,
"population": 5000000,
"monuments": [
{
"name": "Hello world",
"type": "tower",
"address": "Via Roma, 24"
},
{
"name": "Church of St. Paul",
"type": "church",
"address": "Via da qui, 5",
"websites": [
{
"name": "A bad website",
"type": "Official website",
"url": "https://www.hword.it"
}
]
}
]
},
{
"name": "Napoli",
"id": 3,
"population": 3000000,
"monuments": {
"name": "Lorem ipsum",
"type": "Building",
"address": "Via Chocchi, 3"
}
}
]
}
}
If my Json was simpler as this one:
{
"country": {
"name": "Italy",
"cities": [
{
"name": "Palermo",
"id": 1,
"population": 5000000,
"monuments": [
{
"name": "Church of St. Peter",
"type": "church",
"address": "Viale della Chiesa, 1"
},
{
"name": "Church of St. Mary",
"type": "church",
"address": "Viale del Tramonto, 5"
}
]
},
{
"name": "Milan",
"id": 2,
"population": 5000000,
"monuments": [
{
"name": "Hello world",
"type": "tower",
"address": "Via Roma, 24"
}
]
}
]
}
}
If I run
mlr --j2m --from input.json put -q '
for (city in $country.cities) {
emit1 mapsum({"id": city.id}, city.monuments);
}
'
I have an empty result.
It is evident that I have yet to understand the basics, that I have yet to understand how to flow in the structure of the JSON.
In the next few days, I will continue and update you here.
Thank you
I'm closing it. I must study :)
Hi, I have this input
If I run
mlr --j2c cat input.json
I haveI want to cut, all the fields that contain
.
. Then I change my command in:but I have no result. How to apply regex to the output symbol of a flattened JSON input (the
.
)?I'm using Miller 6.
Thank you