This project presents a formal taxonomy to classify JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria.
Open the online demo here.
Software systems make use of JSON to model diverse and domain-specific data structures. Each of these data structures have characteristics that distinguish them from other data structures. For example, a data structure that models a person is fundamentally different from a data structure that models sensor data. These characteristics describe the essence of the data structure. Therefore, two instances of the same data structure inherit the same or similar characteristics despite having different values.
While we intuitively know these characteristics exist, we lack a common terminology to describe them in unambiguous ways. In an attempt to solve this problem, this taxonomy presents a formal vocabulary to describe, reason and talk about JSON documents in a high-level manner given the characteristics of the data structures they represent.
Size | Content | Redundancy | Structure | Acronym |
---|---|---|---|---|
Tier 1 Minified < 100 bytes | Numeric | Redundant | Flat | Tier 1 NRF |
Tier 1 Minified < 100 bytes | Numeric | Redundant | Nested | Tier 1 NRN |
Tier 1 Minified < 100 bytes | Numeric | Non-Redundant | Flat | Tier 1 NNF |
Tier 1 Minified < 100 bytes | Numeric | Non-Redundant | Nested | Tier 1 NNN |
Tier 1 Minified < 100 bytes | Textual | Redundant | Flat | Tier 1 TRF |
Tier 1 Minified < 100 bytes | Textual | Redundant | Nested | Tier 1 TRN |
Tier 1 Minified < 100 bytes | Textual | Non-Redundant | Flat | Tier 1 TNF |
Tier 1 Minified < 100 bytes | Textual | Non-Redundant | Nested | Tier 1 TNN |
Tier 1 Minified < 100 bytes | Boolean | Redundant | Flat | Tier 1 BRF |
Tier 1 Minified < 100 bytes | Boolean | Redundant | Nested | Tier 1 BRN |
Tier 1 Minified < 100 bytes | Boolean | Non-Redundant | Flat | Tier 1 BNF |
Tier 1 Minified < 100 bytes | Boolean | Non-Redundant | Nested | Tier 1 BNN |
Tier 2 Minified ≥ 100 < 1000 bytes | Numeric | Redundant | Flat | Tier 2 NRF |
Tier 2 Minified ≥ 100 < 1000 bytes | Numeric | Redundant | Nested | Tier 2 NRN |
Tier 2 Minified ≥ 100 < 1000 bytes | Numeric | Non-Redundant | Flat | Tier 2 NNF |
Tier 2 Minified ≥ 100 < 1000 bytes | Numeric | Non-Redundant | Nested | Tier 2 NNN |
Tier 2 Minified ≥ 100 < 1000 bytes | Textual | Redundant | Flat | Tier 2 TRF |
Tier 2 Minified ≥ 100 < 1000 bytes | Textual | Redundant | Nested | Tier 2 TRN |
Tier 2 Minified ≥ 100 < 1000 bytes | Textual | Non-Redundant | Flat | Tier 2 TNF |
Tier 2 Minified ≥ 100 < 1000 bytes | Textual | Non-Redundant | Nested | Tier 2 TNN |
Tier 2 Minified ≥ 100 < 1000 bytes | Boolean | Redundant | Flat | Tier 2 BRF |
Tier 2 Minified ≥ 100 < 1000 bytes | Boolean | Redundant | Nested | Tier 2 BRN |
Tier 2 Minified ≥ 100 < 1000 bytes | Boolean | Non-Redundant | Flat | Tier 2 BNF |
Tier 2 Minified ≥ 100 < 1000 bytes | Boolean | Non-Redundant | Nested | Tier 2 BNN |
Tier 2 Minified ≥ 1000 bytes | Numeric | Redundant | Flat | Tier 3 NRF |
Tier 2 Minified ≥ 1000 bytes | Numeric | Redundant | Nested | Tier 3 NRN |
Tier 2 Minified ≥ 1000 bytes | Numeric | Non-Redundant | Flat | Tier 3 NNF |
Tier 2 Minified ≥ 1000 bytes | Numeric | Non-Redundant | Nested | Tier 3 NNN |
Tier 2 Minified ≥ 1000 bytes | Textual | Redundant | Flat | Tier 3 TRF |
Tier 2 Minified ≥ 1000 bytes | Textual | Redundant | Nested | Tier 3 TRN |
Tier 2 Minified ≥ 1000 bytes | Textual | Non-Redundant | Flat | Tier 3 TNF |
Tier 2 Minified ≥ 1000 bytes | Textual | Non-Redundant | Nested | Tier 3 TNN |
Tier 2 Minified ≥ 1000 bytes | Boolean | Redundant | Flat | Tier 3 BRF |
Tier 2 Minified ≥ 1000 bytes | Boolean | Redundant | Nested | Tier 3 BRN |
Tier 2 Minified ≥ 1000 bytes | Boolean | Non-Redundant | Flat | Tier 3 BNF |
Tier 2 Minified ≥ 1000 bytes | Boolean | Non-Redundant | Nested | Tier 3 BNN |
The taxonomy aims to classify JSON documents into a limited and useful set of categories that is easy to reason about rather than exhaustively considering every possible aspect of a data structure. The taxonomy categorizes JSON documents according to their size, content, redundancy and nesting characteristics.
Tier 1: A JSON document is in this category if its UTF-8 minified form occupies less than 100 bytes.
Tier 2: A JSON document is in this category if its UTF-8 minified form occupies 100 bytes or more, but less than 1000 bytes.
Tier 3: A JSON document is in this category if its UTF-8 minified form occupies 1000 bytes or more.
Textual: A JSON document is in this category if it has at least one string value and its number of string values multiplied by the cummulative byte-size occupied by its string values is greater than or equal to the boolean and numeric counterparts.
Numeric: A JSON document is in this category if it has at least one number value and its number of number values multiplied by the cummulative byte-size occupied by its number values is greater than or equal to the textual and boolean counterparts.
Boolean: A JSON document is in this category if it has at least one boolean or null value and its number of boolean and null values multiplied by the cummulative byte-size occupied by its boolean and null values is greater than or equal to the textual and numeric counterparts.
Structural: A JSON document is in this category if it does not include any string, boolean, null or number values.
A JSON document can be categorizes as textual, numeric and boolean at the same time.
Non-redundant: A JSON document is in this category if less than 25% percent of its scalar and composite values are redundant.
Redundant: A JSON document is in this category if at least 25% percent of its scalar and composite values are redundant.
Flat: A JSON document is in this category if the height of the document multiplied by the non-root level with the largest byte-size when taking textual, numeric and boolean values into account is less than 10. If two levels have the byte size, the highest level is taken into account.
Nested: A JSON document is in this category if it is considered structural and its height is greater than or equal to 5, or if the height of the document multiplied by the non-root level with the largest byte-size when taking textual, numeric and boolean values into account is greater than or equal to 10. If two levels have the byte size, the highest level is taken into account.
This repository publishes an npm package which can be installed as follows:
npm install --save @sourcemeta/json-taxonomy
The module exposes a single function that takes any JSON value and returns the sequence of taxonomy qualifiers as an array of strings:
const taxonomy = require('@sourcemeta/json-taxonomy')
const value = {
foo: 2
}
console.log(taxonomy(value))
// [ 'tier 1', 'numeric', 'non-redundant', 'flat' ]
The published npm package includes a simple command-line interface program that can be globally installed as follows:
npm install --global @sourcemeta/json-taxonomy
The CLI program takes the path to a JSON document as an argument and outputs the taxonomy to standard output:
json-taxonomy path/to/document.json
This project is released under the terms specified in the license. This project extends previous academic work by the same author at University of Oxford.