spc476 / CBOR

The most comprehensive CBOR module in the Lua universe.
GNU Lesser General Public License v3.0
22 stars 3 forks source link
         The Concise Binary Object Represenation Lua Modules

The major module in this collection is 'org.conman.cbor', the most comprehensive CBOR module in the Lua universe, as it supports everything mentioned in RFC-8949 and the extensions published by the IANA. It's easy to use, but also allows enough control to get exactly what you want when encoding to CBOR. For example:

cbor = require "org.conman.cbor"

data =
{
  name   = "Sean Conner",
  userid = "spc476",
  login  =
  {
    active     = true,
    last_login = os.time(),
  }
}

enc = cbor.encode(data)

It's that easy. To decode is just as simple:

data = cbor.decode(enc)

For a bit more control over the encoding though, say, to include some semantic tagging, requires a bit more work, but is still quite doable. For example, if we change things slghtly:

mt =
{
  __tocbor = function(self)
    return cbor.TAG._epoch(os.time(self))
  end
}

function date()
  local now = os.date("*t")
  setmetatable(now,mt)
  return now
end

data = 
{
  name = "Sean Conner",
  userid = "spc476",
  login = 
  {
    active = true,
    last_login = date()
  }
}

enc = cbor.encode(data)

You can supply the '__tocbor' metamethod to a metatable to have the item encode itself as this example shows. This is one approach to have userdata encoded into CBOR.

Decoding tagged items is also quite easy:

tagged_items =
{
  _epoch = function(value)
    local t = os.date("*t",value)
    setmetatable(t,mt)
    return t
  end
}

-- first parameter is the data we're decoding
-- second paramter is the offset to start with (optional)
-- third parameter is a table of routines to process
-- tagged data (optional)

data = cbor.decode(enc,1,tagged_items)

The _epoch tagged type is passed to our function where we return the value from os.date(). The resulting data looks like:

data =
{
  name   = "Sean Conner",
  userid = "spc476",
  login  =
  {
    active     = true,
    last_login = 
    {
      year  = 2016,
      month = 4,
      day   = 4,
      hour  = 1,
      min   = 42,
      sec   = 53,
      wday  = 2,
      yday  = 95,
      isdst = true,
    } -- and we have a metatable that can encode this
  }
}

Of course, you could also do it like:

enc = cbor.TYPE.MAP(3)
   .. cbor.encode "name"   .. cbor.encode(data.name)
   .. cbor.encode "userid" .. cbor.encode(data.userid)
   .. cbor.encode "login"  .. cbor.TYPE.MAP(2)
    .. cbor.encode "active" 
    .. cbor.encode(data.login.active)
    .. cbor.encode "last_login" 
    .. cbor.TYPE._epoch(os.time(data.login.last_login))

if you really want to get down into the details of encoding CBOR.

Tagged types are not the only thing that can be post-processed though. If we wanted to ensure that all the field names were lower case, we can do that as well:

tagged_items =
{
  TEXT = function(value,iskey)
    if iskey then 
      value = value:lower()
    end
    return value
  end,

  _epoch = function(value)
    local t = os.date("*t",value)
    setmetatable(t,mt)
    return t
  end,
}

data = cbor.decode(enc,1,tagged_items)

The second parameter indicates if the given value is a key in a MAP.

This module can also deal with circular references with an additional parameter when encoding:

x = {}
x.x = x

enc = cbor.encode(x,{})

The addition of the empty table as the second parameter let's the module know that MAPs and ARRAYs might be referenced and to keep track as it's processing. The reason it's not done by default is that each MAP and ARRAY is tagged as "shareable", which adds some overhead to the output. If there are no circular references, such additions are just wasted. Just something to keep in mind.

But the above does work and can be passed to cbor.decode() without any additional parameters (decoding will deal with such references automatically).

Additionally, if the data you are sending might use repetative strings, say:

data =
{
  { filename = "cbor_c.so"    , package = "org.conman" },
  { filename = "cbor.lua"     , package = "org.conman" },
  { filename = "cbor_s.lua"   , package = "org.conman" },
  { filename = "cbormisc.lua" , package = "org.conman" },
}

You can save some space by using string references:

enc = cbor.encode(data,nil,{}) -- third parameter to use string refs

Normally, this would take 160 bytes to encode, but with string references, it saves 54 bytes. Not much in this example, but it could add up significantly.

And yes, you can request both shared references and string references in the same call.

The CBOR values null and undefined map to Lua nil, and all the baggage that comes with that. If you really need to track null and/or undefined values, you can define the following with a unique value:

cbor.null
cbor.undefined

And the value of those two fields will be checked for when encoding and the appropriate CBOR value used. Also, when decoding, the those values will be used when the CBOR null and undefined values are decoded. The safest value to use is an empty table for each one:

cbor.null      = {}
cbor.undefined = {}

That will ensure both have a unique value.

If the 'org.conman.cbor' module is too heavy for your application, you can try the 'org.conman.cbor_s' module. It's similar to the full blown 'org.conman.cbor' module, but without the full support of the various tagged data items. And it too, supports custom null and undefined values---just set

cbor_s.null
cbor_s.undefined

Our original example is the same:

cbor_s = require "org.conman.cbor_s"

data =
{
  name   = "Sean Conner",
  userid = "spc476",
  login  =
  {
    active     = true,
    last_login = os.time(),
  }
}

enc   = cbor_s.encode(data)
data2 = cbor_s.decode(enc)

But the tagged version is slightly different:

mt =
{
  __tocbor = function(self)

    -- ---------------------------------------------------
    -- because of limited support, we need to specify the
    -- actual number for the _epoch tag.
    -- ---------------------------------------------------

    return cbor.encode(os.time(self),1)
  end
}

function date()
  local now = os.date("*t")
  setmetatable(now,mt)
  return now
end

data = 
{
  name = "Sean Conner",
  userid = "spc476",
  login = 
  {
    active = true,
    last_login = date()
  }
}

enc = cbor.encode(data)

tagged_items = 
{
  [1] = function(value)     
    local t = os.date("*t",value)
    setmetatable(t,mt)
    return t
  end,
}

data2 = cbor.decode(enc,1,tagged_items)

The first major difference is the lack of tag names (you have to use the actual values). The second difference is the lack of support for the reference tags (you can still catch them, but due to a lack of support in the underlying engine you can't really do anything with them at this time; you the full blown 'org.conman.cbor' module if you really need circular references) and string references (same issue).

Dropping down yet another level is the 'org.conman.cbor_c' module. This is the basic core of the two previous modules and only supplies two functions, cbor_c.encode() and cbor_c.decode(). The modules 'org.conman.cbor_s' and 'org.conman.cbormisc' were developed to showcase the low level usage of the 'org.conman.cbor_c' module and can be the basis for a module for even more constrained devices.


*

Encoding a mixed Lua table (e.g. { 1 , 2 , 3 , foo = 1 , bar = 2 }) may not encode as expected. For best results, ensure that all Lua tables are either pure sequences (arrays, e.g. { 1 , 2 , 3 , 4 }) or are non-sequential in nature.

NOTE: The following array is problematic:

{ foo = 1 , bar = 2 , [1] = 1 }

because it is both a Lua sequence and contains non-sequence indecies, and will thus be encoded as an ARRAY. The following array:

{ foo = 1 , bar = 2 , [100] = 1 }

is NOT considered a sequence and will be encoded as a MAP.


*

Full blown CBOR support. All of RFC-8949 as well as the currently defined IANA extensions are supported with this module. This module also includes enhanced error detection and type checking. As a result, it's quite a bit larger than the org.conman.cbor_s module. This module also contains a large number of functions.

==============================================================

Usage: bool = cbor.isnumber(ctype) Desc: returns true of the given CBOR type is a number Input: type (enum/cbor) CBOR type Return: bool (boolean) true if number, false otherwise

==============================================================

Usage: bool = cbor.isinteger(ctype)
Desc: returns true if the given CBOR type is an integer Input: ctype (enum/cbor) CBOR type Return: bool (boolean) true if number, false othersise

==============================================================

Usage: bool = cbor.isfloat(ctype) Desc: returns true if the given CBOR type is a float Input: ctype (enum/cbor) CBOR type Return: bool (boolean) true if number, false otherwise

==============================================================

Usage: value,pos2,ctype = cbor.decode(packet[,pos][,conv][,ref][,iskey]) Desc: Decode CBOR encoded data Input: packet (binary) CBOR binary blob pos (integer/optional) starting point for decoding conv (table/optional) table of conversion routines ref (table/optional) reference table (see notes) iskey (boolean/optional) is a key in a MAP (see notes) Return: value (any) the decoded CBOR data pos2 (integer) offset past decoded data ctype (enum/cbor) CBOR type of value

Note: The conversion table should be constructed as: { UINT = function(v) return munge(v) end, _datetime = function(v) return munge(v) end, _url = function(v) return munge(v) end,, }

The keys are CBOR types (listed above).  These functions are
expected to convert the decoded CBOR type into a more appropriate
type for your code.  For instance, an _epoch can be converted into a
table.

Users of this function *should not* pass a reference table into this
routine---this is used internally to handle references.  You need to
know what you are doing to use this parameter.  You have been
warned.

The iskey is true if the value is being used as a key in a map, and
is passed to the conversion routine; this too, is an internal use
only variable and you need to know what you are doing to use this. 
You have been warned.

This function can throw an error.  The returned error object MAY BE
a table, in which case it has the format:

{
  msg = "Error text",
  pos = 13 -- position in binary object of error
}

This function can throw errors

==============================================================

Usage: value,pos2,ctype[,err] = cbor.pdecode(packet[,pos][,conv][,ref]) Desc: Protected call to cbor.decode(), which will return an error Input: packet (binary) CBOR binary blob pos (integer/optional) starting point for decoding conv (table/optional) table of conversion routines (see cbor.decode()) ref (table/optional) reference table (see cbor.decode()) Return: value (any) the decoded CBOR data, nil on error pos2 (integer) offset past decoded data; if error, position of error ctype (enum/cbor) CBOR type err (string/optional) error message (if any)

==============================================================

Usage: blob = cbor.encode(value[,sref][,stref]) Desc: Encode a Lua type into a CBOR type Input: value (any) sref (table/optional) shared reference table stref (table/optional) shared string reference table Return: blob (binary) CBOR encoded value

Note: This function can throw errors

==============================================================

Usage: blob[,err] = cbor.pencode(value[,sref][,stref]) Desc: Protected call to encode a CBOR type Input: value (any) sref (table/optional) shared reference table stref (table/optional) shared string reference table Return: blob (binary) CBOR encoded value, nil on error err (string/optional) error message

==============================================================

cbor.__ENCODE_MAP

A table of functions to map Lua values to CBOR encoded values. nil, boolean, number and string are handled directly (if a Lua string is valid UTF8, then it's encoded as a CBOR TEXT.

For the other four types, only tables are directly supported without metatable support. If a metatable does exist, if the method 'tocbor' is defined, that function is called and the results returned. If 'len' is defined, then it is mapped as a CBOR ARRAY. For Lua 5.2, if 'ipairs' is defined, then it too, is mapped as a CBOR ARRAY. If Lua 5.2 or higher and 'pairs' is defined, then it's mapped as a CBOR MAP.

Otherwise, an error is thrown.

--------------------------------------------------------------

Usage: blob = cbor.__ENCODE_MAPluatype Desc: Encode a Lua type into a CBOR type Input: value (any) a Lua value who's type matches luatype. sref (table/optional) shared reference table stref (table/optional) shared string reference table Return: blob (binary) CBOR encoded data

==============================================================

cbor.TYPE

Both encoding and decoding functions for CBOR base types are in this table. The functions 'UINT' (unsigned integer), 'NINT' (negative integer), 'BIN' (binary string), 'TEXT' (UTF-8 encoded text), 'ARRAY' (a Lua array or sequence) and 'MAP' are used to encode the given type. The numeric entries are for decoding CBOR data.

--------------------------------------------------------------

Usage: blob = cbor.TYPE'name' Desc: Encode a CBOR base type Input: n (integer string table) Lua type (see notes) sref (table/optional) shared reference table stref (table/optional) shared string reference table Return: blob (binary) CBOR encoded value

Note: UINT and NINT take an integer.

BIN and TEXT take a string.  TEXT will check to see if
the text is well formed UTF8 and throw an error if the
text is not valid UTF8.

ARRAY and MAP take a table of an appropriate type. No
checking is done of the passed in table, so a table
of just name/value pairs passed in to ARRAY will return
an empty CBOR encoded array.

TAG and SIMPLE encoding are handled elsewhere.

If the data you have includes tables with cycles, you will need to
pass in an empty table for the sref parameter, otherwise, you run
the risk of blowing out the stack.

The parameter stref is used to pass strings as references in the 
resulting CBOR.  This will make the resulting CBOR data smaller, but
make sure the receiving end can deal with string references.

ARRAY and MAP references are separate from string referenes.  It is
also safe to use both with this module.

--------------------------------------------------------------

Usage: value2,pos2,ctype = cbor.TYPEn Desc: Decode a CBOR base type Input: packet (binary) binary blob of CBOR data pos (integer) byte offset in packet to start parsing from info (integer) CBOR info (0 .. 31) value (integer) CBOR decoded value conv (table) conversion table (passed to decode()) ref (table) used to generate references (TAG types only) Return: value2 (any) decoded CBOR value pos2 (integer) byte offset just past parsed data ctype (enum/cbor) CBOR deocded type

Note: tag* is returned for any non-supported TAG types. The actual format is 'tag' ---for example, 'tag_1234567890'. Supported TAG types will return the appropriate type name.

simple is returned for any non-supported SIMPLE types. 
Supported simple types will return the appropriate type
name.

The ref parameter is not marked optional here---it's used internally
to handle ARRAY/MAP and string references.  Make sure you always
pass in the same table (it should be empty initally) here for
consistent results.

==============================================================

    cbor.TAG

Encoding and decoding of CBOR TAG types are here. Like cbor.TYPE, the named entries are for encoding and the numbered entries are for decoding. Named types are:

    * _datetime datetime (TEXT)
    * _epoch    see cbor.isnumber()
    * _pbignum  positive bignum (BIN)
    * _nbignum  negative bignum (BIN)
    * _decimalfraction ARRAY(integer exp, integer mantissa)
    * _bigfloat ARRAY(float exp,integer mantissa)
    * _tobase64url  should be base64url encoded (BIN)
    * _tobase64 should be base64 encoded (BIN)
    * _tobase16 should be base16 encoded (BIN)
    * _cbor     CBOR encoded data (BIN)
    * _url      URL (TEXT)
    * _base64url    base64url encoded data (TEXT)
    * _base64   base64 encoded data (TEXT)
    * _regex    regex (TEXT)
    * _mime     MIME encoded messsage (TEXT)
    * _magic_cbor   itself (no data, used to self-describe CBOR data)

    * _nthstring    shared string
    * _perlobj  Perl serialized object
    * _serialobj    Generic serialized object
    * _shareable    sharable resource (ARRAY or MAP)
    * _sharedref    reference (UINT)
    * _rational Rational number
    * _uuid     UUID value (BIN)
    * _language Language-tagged string
    * _id       Identifier
    * _stringref    string reference
    * _bmime    Binary MIME message
    * _decimalfractionexp like _decimalfraction, non-int exponent
    * _bigfloatexp  like _bigfloat, non-int exponent
    * _indirection  Indirection
    * _rains    RAINS based message

--------------------------------------------------------------

Usage: blob = cbor.TAG'name' Desc: Encode a CBOR tagged value Input: value (any) any Lua type sref (table/optional) shared reference table stref (table/optional) shared string reference table Return: blob (binary) CBOR encoded tagged value

Note: Some tags only support a subset of Lua types.

--------------------------------------------------------------

Usage: value,pos2,ctype = cbor.TAGn Desc: Decode a CBOR tagged value Input: packet (binary) binary blob of CBOR tagged data pos (integer) byte offset into packet conv (table) conversion routines (passed to decode()) ref (table) reference table Return: value (any) decoded CBOR tagged value pos2 (integer) byte offset just past parsed data ctype (enum/cbor) CBOR type of value

==============================================================

    cbor.SIMPLE

Encoding and decoding of CBOR simple types are here. These are:

    * false     false value (Lua false)
    * true      true value  (Lua true)
    * null      NULL value  (Lua nil)
    * undefined undefined value (Lua nil)
    * half      half precicion   IEEE 754 float
    * single    single precision IEEE 754 float
    * double    double precision IEEE 754 float
    * __break   SEE NOTES

--------------------------------------------------------------

Usage: blob = cbor.SIMPLE'name' Desc: Encode a CBOR simple type Input: n (number/optional) floating point number to encode (see notes) Return: blob (binary) CBOR encoded simple type

Note: Some functions ignore the passed in parameter.

WARNING! The functions that do not ignore the parameter may
throw an error if floating point precision will be lost
during the encoding.  Please be aware of what you are doing
when calling SIMPLE.half(), SIMPLE.float() or
SIMPLE.double().

--------------------------------------------------------------

Usage: value2,pos,ctype = cbor.SIMPLEn Desc: Decode a CBOR simple type Input: pos (integer) byte offset in packet value (number/optional) floating point number Return: value2 (any) decoded value as Lua value pos (integer) original pos passed in (see notes) ctype (enum/cbor) CBOR type of value

Note: The pos parameter is passed in to avoid special cases in the code and to conform to all other decoding routines.


*

This module provides a simple (or small, take your pick) implementation of CBOR, which should be fine for most uses, as long as such uses don't really require the need of TAGS (although there is some minimal support for such) and as long as you stick to simple Lua types like nil, booleans, numbers, strings and tables of only simple Lua types and that have no cycles. This function defines four functions. No type checking is done when encoding tagged values.

==============================================================

Usage: blob = cbor.encode(value[,tag]) Desc: Encode a Lua type into a CBOR type Input: value (any) tag (number/optional) CBOR tag value Return: blob (binary) CBOR encoded value

Note: This function can throw errors

==============================================================

Usage: blob[,err] = cbor_s.pencode(value[,tag]) Desc: Protected call to encode into CBOR Input: value (any) tag (number/optional) CBOR tag value Return: blob (binary) CBOR encoded value err (string/optional) error message

==============================================================

Usage: value,pos2,ctype = cbor.decode(packet[,pos][,conv]) Desc: Decode CBOR encoded data Input: packet (binary) CBOR binary blob pos (integer/optional) starting point for decoding conv (table/optional) table of tagged conversion routines Return: value (any) the decoded CBOR data pos2 (integer) offset past decoded data ctype (enum/cbor) CBOR type of value

Note: The conversion table should be constructed as:

    {
      [ 0] = function(v) return munge(v) end,
      [32] = function(v) return munge(v) end,,
    }

    The keys are CBOR types (as integers).  These functions are
    expected to convert the decoded CBOR type into a more
    appropriate type for your code.  For instance, [1] (epoch)
    can be converted into a table.

    This function can throw errors.

==============================================================

Usage: value,pos2,ctype[,err] = cbor.pdecode(packet[,pos][,conv]) Desc: Protected call to decode CBOR data Input: packet (binary) CBOR binary blob pos (integer/optional) starting point for decoding conv (table/optional) table of tagged conversion routines Return: value (any) the decoded CBOR data, nil on error pos2 (integer) offset past decoded data, 0 on error ctype (enum/cbor) CBOR type of value err (string/optional) error message, if any


*

This module provides the foundation of the CBOR modules and is written in C. This module deals with the lowest level details of encoding and decoding CBOR data. It will encode data with the minimal encoding size [1]. This module provides just two functions, and it helps to be familiar with RFC-8949 to use this module properly.

NOTE: Both functions can throw an error.

[1] Floating point values will by default be encoded with the minimal encoding size without losing precision. It is possible to use a larger encoding if wanted.

==============================================================

Usage: blob = cbor_c.encode(type,value[,value2]) Desc: Encode a CBOR value Input: type (integer) CBOR type value (number) value to encode (see note) value (number/optional) float to encode (see note) Return: blob (binary) CBOR encoded value

Note: value is optional for type of 0xE0. value2 is optional for type of 0xE0; otherwise it's ignored.

    To encode a break value: 
        blob = cbor_c.encode(0xE0)

    To encode a floating point value with a minimal encoding:
        blob = cbor_c.encode(0xE0,nil,1.23)

    To force a particular encoding of a float value:
        blob = cbor_c.encode(0xE0,27,math.huge)

    To encode an array of indeterminate length:
        blob = cbor_c.encode(0xA0)
            -- encode entries
        blob = blob .. cbor_c.encode(0xE0)

==============================================================

Usage: ctype,info,value,pos2 = cbor_c.decode(blob,pos) Desc: Decode a CBOR-encoded value Input: blob (binary) binary CBOR sludge pos (integer) position to start decoding from Return: ctype (integer) CBOR major type info (integer) sub-major type information value (integer number) decoded value pos2 (integer) position past decoded data

Note: Throws in invalid parameter


*

This module contains miscellaneous routines related to CBOR. Currently, two functions are defined, and these are not required for normal CBOR usage.

==============================================================

Usage: diag = cbormisc.diagnostic(packet[,pos]) Desc: Output CBOR encoded data in the CBOR diagnostic output format Input: packet (binary) CBOR encoded data pos (integer/optional) starting point for decoding Return: diag (string) CBOR data in CBOR diagnostic format

Note: This function can throw errors

==============================================================

Usage: diag[,err] = cbormisc.pdiagnostic(packet[,pos]) Desc: Protected call to cbormisc.diagnostic Input: packet (binary) CBOR encoded data pos (integer/optional) starting point for decoding Return: diag (string) CBOR data in CBOR diagnostic format err (string/optional) error message if any