sourcemeta-research / json-taxonomy

A formal taxonomy to classify JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria.
https://sourcemeta.github.io/json-taxonomy/
Apache License 2.0
6 stars 1 forks source link
json json-document taxonomic-classification taxonomy

Taxonomy for JSON documents

This project presents a formal taxonomy to classify JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria.

JSON Taxonomy Online Tool Screenshot

Open the online demo here.

Why is this useful?

Software systems make use of JSON to model diverse and domain-specific data structures. Each of these data structures have characteristics that distinguish them from other data structures. For example, a data structure that models a person is fundamentally different from a data structure that models sensor data. These characteristics describe the essence of the data structure. Therefore, two instances of the same data structure inherit the same or similar characteristics despite having different values.

While we intuitively know these characteristics exist, we lack a common terminology to describe them in unambiguous ways. In an attempt to solve this problem, this taxonomy presents a formal vocabulary to describe, reason and talk about JSON documents in a high-level manner given the characteristics of the data structures they represent.

Taxonomy

Size Content Redundancy Structure Acronym
Tier 1 Minified < 100 bytes Numeric Redundant Flat Tier 1 NRF
Tier 1 Minified < 100 bytes Numeric Redundant Nested Tier 1 NRN
Tier 1 Minified < 100 bytes Numeric Non-Redundant Flat Tier 1 NNF
Tier 1 Minified < 100 bytes Numeric Non-Redundant Nested Tier 1 NNN
Tier 1 Minified < 100 bytes Textual Redundant Flat Tier 1 TRF
Tier 1 Minified < 100 bytes Textual Redundant Nested Tier 1 TRN
Tier 1 Minified < 100 bytes Textual Non-Redundant Flat Tier 1 TNF
Tier 1 Minified < 100 bytes Textual Non-Redundant Nested Tier 1 TNN
Tier 1 Minified < 100 bytes Boolean Redundant Flat Tier 1 BRF
Tier 1 Minified < 100 bytes Boolean Redundant Nested Tier 1 BRN
Tier 1 Minified < 100 bytes Boolean Non-Redundant Flat Tier 1 BNF
Tier 1 Minified < 100 bytes Boolean Non-Redundant Nested Tier 1 BNN
Tier 2 Minified ≥ 100 < 1000 bytes Numeric Redundant Flat Tier 2 NRF
Tier 2 Minified ≥ 100 < 1000 bytes Numeric Redundant Nested Tier 2 NRN
Tier 2 Minified ≥ 100 < 1000 bytes Numeric Non-Redundant Flat Tier 2 NNF
Tier 2 Minified ≥ 100 < 1000 bytes Numeric Non-Redundant Nested Tier 2 NNN
Tier 2 Minified ≥ 100 < 1000 bytes Textual Redundant Flat Tier 2 TRF
Tier 2 Minified ≥ 100 < 1000 bytes Textual Redundant Nested Tier 2 TRN
Tier 2 Minified ≥ 100 < 1000 bytes Textual Non-Redundant Flat Tier 2 TNF
Tier 2 Minified ≥ 100 < 1000 bytes Textual Non-Redundant Nested Tier 2 TNN
Tier 2 Minified ≥ 100 < 1000 bytes Boolean Redundant Flat Tier 2 BRF
Tier 2 Minified ≥ 100 < 1000 bytes Boolean Redundant Nested Tier 2 BRN
Tier 2 Minified ≥ 100 < 1000 bytes Boolean Non-Redundant Flat Tier 2 BNF
Tier 2 Minified ≥ 100 < 1000 bytes Boolean Non-Redundant Nested Tier 2 BNN
Tier 2 Minified ≥ 1000 bytes Numeric Redundant Flat Tier 3 NRF
Tier 2 Minified ≥ 1000 bytes Numeric Redundant Nested Tier 3 NRN
Tier 2 Minified ≥ 1000 bytes Numeric Non-Redundant Flat Tier 3 NNF
Tier 2 Minified ≥ 1000 bytes Numeric Non-Redundant Nested Tier 3 NNN
Tier 2 Minified ≥ 1000 bytes Textual Redundant Flat Tier 3 TRF
Tier 2 Minified ≥ 1000 bytes Textual Redundant Nested Tier 3 TRN
Tier 2 Minified ≥ 1000 bytes Textual Non-Redundant Flat Tier 3 TNF
Tier 2 Minified ≥ 1000 bytes Textual Non-Redundant Nested Tier 3 TNN
Tier 2 Minified ≥ 1000 bytes Boolean Redundant Flat Tier 3 BRF
Tier 2 Minified ≥ 1000 bytes Boolean Redundant Nested Tier 3 BRN
Tier 2 Minified ≥ 1000 bytes Boolean Non-Redundant Flat Tier 3 BNF
Tier 2 Minified ≥ 1000 bytes Boolean Non-Redundant Nested Tier 3 BNN

The taxonomy aims to classify JSON documents into a limited and useful set of categories that is easy to reason about rather than exhaustively considering every possible aspect of a data structure. The taxonomy categorizes JSON documents according to their size, content, redundancy and nesting characteristics.

Size

Content

A JSON document can be categorizes as textual, numeric and boolean at the same time.

Redundancy

Nesting

Usage (JavaScript)

This repository publishes an npm package which can be installed as follows:

npm install --save @sourcemeta/json-taxonomy

The module exposes a single function that takes any JSON value and returns the sequence of taxonomy qualifiers as an array of strings:

const taxonomy = require('@sourcemeta/json-taxonomy')

const value = {
  foo: 2
}

console.log(taxonomy(value))
// [ 'tier 1', 'numeric', 'non-redundant', 'flat' ]

Usage (CLI)

The published npm package includes a simple command-line interface program that can be globally installed as follows:

npm install --global @sourcemeta/json-taxonomy

The CLI program takes the path to a JSON document as an argument and outputs the taxonomy to standard output:

json-taxonomy path/to/document.json

License

This project is released under the terms specified in the license. This project extends previous academic work by the same author at University of Oxford.