shamblett / cbor

A CBOR implementation for Dart
MIT License
36 stars 16 forks source link

Nesting complex data #10

Closed DanielSoCra closed 3 years ago

DanielSoCra commented 4 years ago

When encoding complex objects, we have to use the indefinite versions of arrays and maps. For example, to encode this structure (imagine 123456789 beeing a larger BigInt):

[{1 : 123456789}]

we need to use indefinite Maps and Arrays:

import 'package:convert/convert.dart';
import 'package:typed_data/typed_data.dart';
import 'package:cbor/cbor.dart' as cbor;

void main() {
  final cbor.Cbor inst = cbor.Cbor();
  final cbor.Encoder encoder = inst.encoder;

  encoder.writeArrayImpl([], true, 1);

  encoder.writeMapImpl({1:null}, true, 1);
  encoder.writeInt(1);
  encoder.writeTag(2); // BigInt
  encoder.writeInt(BigInt.from(123456789).toInt());
  encoder.writeBreak();

  encoder.writeBreak();

  final Uint8Buffer buff = inst.output.getData();
  List<int> encodedBytes = buff.buffer.asUint8List();

  inst.decodeFromList(encodedBytes);

  List<dynamic> decodedData = inst.getDecodedData();
  String hexDump = hex.encode(encodedBytes);
  print(hexDump); // 9fbf01f601c21a075bcd15ffff
  print(decodedData); // [[{1: 123456789}]]
}

The output from decodedData is correct, however the cbor encoding looks like this:

Screenshot 2020-01-06 at 12 37 06

So the way it works right now is that we have to define at least one key value pair in a dart map when calling encoder.writeMap(); because we can't pass null. This is not suitable when we have only advanced data types.

What we can do is calling encoder.writeMapImpl({MAP_KEY: null}, true, LENGTH); and then overwriting the MAP_KEY: null with the actual data by calling

  encoder.writeInt(MAP_KEY);
  encoder.writeTag(2); // BigInt
  encoder.writeInt(BigInt.from(123456789).toInt());

This does get decoded correctly, even on cbor.me.

However, the cbor output does still contain the duplicated key-value pair of 1 : nullor 01 F6 as hex.

This and the fact that we have to use indefinite arrays and maps leads to a increase in size of 2 bytes in this case, while the unused 1: null also takes 2 bytes which is a total increase of 44 % in this specific case.

Is there currently any way we can do this better?

nailgilaziev commented 4 years ago

Yeah, I interested in this too. I encode data like this (map with different type values) {1:11,2:22,3:[{1:123,4:true,9:"text"}]} or (arrays with different type items) {1:11,2:22,3:[123,true,"text"]}

How to do this efficiently?

shamblett commented 4 years ago

OK, for the first poster why can't you do this

 encoder.writeArray([{1:123456789}]);

this decodes to

81a1011a075bcd15

and decodes correctly in cbor.me

Also second poster can you not do this

encoder.writeMap({1:11,2:22,3:[{1:123,4:true,9:"text"}]});

which also works?

DanielSoCra commented 4 years ago

Hi, thank you for your work!

The issue is when working with tags, such as the BigInt()

shamblett commented 4 years ago

Yes Ok, I think we need a tag builder package of some sort to assist with tag encoding, we also need some way of assembling a map/list inline, e.g.

open map
add key
add value
add keyValue
close map

so you can do this without using indefinites, does this sound OK, do you have anything else in mind?

DanielSoCra commented 4 years ago

Yes, that is absolutely what I was thinking of. The existing way of doing things can be left untouched.

It would be great if we could use still definite arrays / maps with the open / close principle.

The length of the map / array can be calculated from the actual length of the map / array and encoded after the array / map is closed.

shamblett commented 4 years ago

Yep OK I'll start putting together some updates along these lines and see how it pans out.

shamblett commented 4 years ago

OK, package updated and re published at version 3.0.0, please have a look at the examples for the new list builder and map builder encoder classes and read the API for docs for these, hopefully we can now better encode lists/maps with complex items without using indefinite sequences.