Open pkoppstein opened 9 years ago
I must say, this is very nice!
Here's a handy and simple schema inference program I use:
[path(..) | ["",(.[]|if type=="number" then "[]" else . end)]] |
sort | unique | .[] | join(".") | sub("\\.\\[";"[") | sub("^\\[";".[")
What's handy about this is that it outputs jq path expressions. It needs a bit of work (to deal with object key names that need quoting because they aren't ident-like). What should we call this?
Should your and my schema utils be in 1.5, or in a module? I think this is almost a killer app for jq...
EDIT: formatting.
Using your program (with \
properly escaped) on armor.json at https://github.com/CleverRaven/Cataclysm-DDA/blob/master/data/json/items/armor.json:
.[]
.[].//
.[].ammo
.[].bashing
.[].bashing_protection
.[].category
.[].charges_per_use
.[].color
.[].coverage
.[].covers
.[].covers.[]
.[].cut
.[].cutting
.[].description
.[].encumbrance
.[].environmental_protection
.[].flags
.[].flags.[]
.[].id
.[].initial_charges
.[].material
.[].material.[]
.[].material_thickness
.[].max_charges
.[].name
.[].name_plural
.[].note
.[].phase
.[].power_armor
.[].price
.[].properties
.[].properties.[]
.[].properties.[].[]
.[].qualities
.[].qualities.[]
.[].qualities.[].[]
.[].revert_to
.[].snippet_category
.[].snippet_category.[]
.[].snippet_category.[].id
.[].snippet_category.[].text
.[].storage
.[].symbol
.[].techniques
.[].techniques.[]
.[].to_hit
.[].turns_per_charge
.[].type
.[].use_action
.[].use_action.activate_msg
.[].use_action.deactive_msg
.[].use_action.need_sunlight
.[].use_action.type
.[].volume
.[].warmth
.[].weight
Using the 'schema' def in schema.jq at https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed#file-schema-jq the result is:
$ jq -r -f schema.jq /tmp/armor.json
{
"//": "string",
"ammo": "string",
"bashing": "number",
"bashing_protection": "number",
"category": "string",
"charges_per_use": "number",
"color": "string",
"coverage": "number",
"covers": [
"string"
],
"cut": "number",
"cutting": "number",
"description": "string",
"encumbrance": "number",
"environmental_protection": "number",
"flags": [
"string"
],
"id": "string",
"initial_charges": "number",
"material": "JSON",
"material_thickness": "number",
"max_charges": "number",
"name": "string",
"name_plural": "string",
"note": "string",
"phase": "string",
"power_armor": "boolean",
"price": "number",
"properties": [
[
"string"
]
],
"qualities": [
[
"scalar"
]
],
"revert_to": "string",
"snippet_category": [
{
"id": "string",
"text": "string"
}
],
"storage": "number",
"symbol": "string",
"techniques": [
"string"
],
"to_hit": "number",
"turns_per_charge": "number",
"type": "string",
"use_action": "JSON",
"volume": "number",
"warmth": "number",
"weight": "number"
}
This is partly an enhancement request, and partly a request for comments.
When confronted with a collection of JSON entities, it is often helpful to know whether there is an implicit schema, and if so, what it is.
The following code generates a simple JSON Schema according to http://json-schema.org/latest/json-schema-validation.html:
def isobject:
type == "object"
;
def isarray:
type == "array"
;
def isscalar:
type| . == "null" or . == "boolean" or . == "number" or . == "string"
;
def schema:
{ "type": type } +
if isobject then
if length == 0 then null
else
. as $object |
{ "properties": (
reduce keys_unsorted[] as $name (
{};
. + {($name): ($object[$name] | schema)}
)
)
}
end
elif isarray then
if length == 0 then null
else
{ "items": (
if all(isscalar) and (map(type) | unique | length) == 1 then
{ "type": (.[0] | type) }
elif length == 1 then
.[0] | schema
else
reduce .[] as $item (
[];
.[length] = ($item | schema)
)
end
)
}
end
else null end # scalar
;
For example, for this input:
{
"address": {
"streetAddress": "21 2nd Street",
"city": "New York"
},
"phoneNumber": [
{
"location": "home",
"code": 44
}
]
}
the generated schema is:
{
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {
"streetAddress": {
"type": "string"
},
"city": {
"type": "string"
}
}
},
"phoneNumber": {
"type": "array",
"items": {
"type": "object",
"properties": {
"location": {
"type": "string"
},
"code": {
"type": "number"
}
}
}
}
}
}
@fadado Beautiful! Can I borrow that for jq?
This is partly an enhancement request, and partly a request for comments.
When confronted with a collection of JSON entities, it is often helpful to know whether there is an implicit schema, and if so, what it is. Even in the case of a single JSON document, it is often useful to have a structural overview, e.g. for navigation.
Spark SQL can infer a schema from a collection of JSON entities. It can be printed using printSchema().
The example given in the O'Reilly Spark book on p. 172 is the pair of records:
Using the proposed schema.jq below, we find:
This is equivalent to the schema inferred by Spark SQL except that:
As illustrated by the above example, the absence of a key in an object also has no particular structural significance for either the Spark SQL inference engine or the one proposed here.
Three noteworthy features of the proposed schema inference engine are:
a) the introduction of "scalar" as an extended type, e.g. ["scalar"] is the extended type signifying an array of 0 or more elements of scalar type;
b) the introduction of "JSON" as an extended type, e.g. ["JSON"] is the extended type signifying an array of 0 or more elements of any type;
c) arrays are only characterized by the extended type of their elements.
Thus, the following JSON object conforms to the above-mentioned schema:
See also #243
schema.jq