planetarium / bencodex

Bencodex: Bencoding Extended
https://bencodex.org/
17 stars 3 forks source link
bencode bencodex serialization-format specification test-suite

Bencodex: Bencoding Extended

The version of this document is 1.3. See also changelog.

There is a list of implementations. See also LIBRARIES.tsv file.

Bencodex is a serialization format that extends BitTorrent's Bencoding. Since it is a superset of Bencoding, every valid Bencoding representation is a valid Bencodex representation of the same meaning (i.e., represents the same value). Bencodex adds the below data types to Bencoding:

Why not [insert your favorite format here]

The unique feature of Bencoding is forced normalization. According to Wikipedia's Bencode page:

For each possible (complex) value, there is only a single valid bencoding; i.e. there is a bijection between values and their encodings. This has the advantage that applications may compare bencoded values by comparing their encoded forms, eliminating the need to decode the values.

This makes things really simple when an application needs to determine if encoded values are the same, in particular, with cryptographic hash or digital signatures.

There have been countless improvements in data serialization like rich data types, human readability, compact binary representation, zero-copy serialization, and even streaming, but canonical representation is still not well counted.

Bencodex actually does not aim high in ambition; it purposes to merely leverage Bencoding's good things with average-level data types of modern serialization formats.

Encoding

Note that notations for the semantics (i.e., the values that encodings represent) use Python's literals.

Test suite

The testsuite/ directory contains a set of Bencodex tests. Every test case is a triple of .dat which is an arbitrary Bencodex data, a .yaml which is its corresponding value in YAML, and a .json which is an alternative to YAML and renders an AST of the Bencodex value.

For example, list.dat contains the below Bencodex data:

lu16:a Unicode string13:a byte stringi123ei-456etfndu1:au4:dictelu1:au4:listee

which encodes the value corresponding to list.yaml, that is:

- a Unicode string
- !!binary "YSBieXRlIHN0cmluZw=="  # b"a byte string"
- 123
- -456
- true
- false
- null
- a: dict
- [a, list]

Or, as an alternative there's list.json which renders an AST of the value structure:

{
  "type": "list",
  "values": [
    {
      "type": "text",
      "value": "a Unicode string"
    },
    {
      "base64": "YSBieXRlIHN0cmluZw==",
      "type": "binary"
    },
    {
      "decimal": "123",
      "type": "integer"
    },
    {
      "decimal": "-456",
      "type": "integer"
    },
    {
      "type": "boolean",
      "value": true
    },
    {
      "type": "boolean",
      "value": false
    },
    {
      "type": "null"
    },
    {
      "pairs": [
        {
          "key": {
            "type": "text",
            "value": "a"
          },
          "value": {
            "type": "text",
            "value": "dict"
          }
        }
      ],
      "type": "dictionary"
    },
    {
      "type": "list",
      "values": [
        {
          "type": "text",
          "value": "a"
        },
        {
          "type": "text",
          "value": "list"
        }
      ]
    }
  ]
}

Note that the schema of .json files is formally described in JSON Schema. see also utils/testsuite-schema.json.

An implementation should satisfy the below rules:


This document (README.md) and every content in this repository including the test suite (testsuite/) are in the public domain.