ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Data model - Integer as map key #341

Open JonasKruckenberg opened 3 years ago

JonasKruckenberg commented 3 years ago

While working on the DAG-COSE specification, this issue came up, because in COSE the header is a cbor map that is keyed with integers, not strings. So I'm at this crossroad where I have to decide wether to break with the COSE spec (meaning that DAG-COSE object won't be valid COSE objects and cannot be decoded by existing tools) or to introduce a violation of the IPLD data model. How should I approach this? Should DAG-COSE implementations be required to convert integers to strings ( basically aliasing the integers with strings? )? I understand that there are languages that don't support maps that are indexed by integers but maybe we can figure something out!

For context: COSE allows the header to be keyed with strings, but all default fields ( those that every implementation has to understand & that are registered with IANA ) are keyed by integer, where positive AND negative integers are allowed.

JonasKruckenberg commented 3 years ago

I'll try to chime into the call next week, since this is a real blocker for the codec and I'd like to get this sorted before the first Encryption WG meeting in January!

vmx commented 3 years ago

If you mean the weekly IPLD meeting, we've cancled the ones till the end of the year. The first one will be on Monday 2020-01-04.

JonasKruckenberg commented 3 years ago

Oooh I see, duh. Well then let's figure something out this way if you're available and maybe discuss is further next year :)

rvagg commented 3 years ago

Since DAG-COSE is its own codec, separate from DAG-CBOR where we have strict rules about which parts of CBOR we use, you should be free to define it as allowing integer keys in the encoding layer. It's when these things are presented in the Data Model that you have to ensure that map have string keys.

For practical purposes this just usually means you have some struct that has fields which have names that may or may not correspond to the integers or strings that existing in the encoding. As long as you can clearly define how the bytes convert into memory in the data model and back again then it doesn't matter too much.

My main concern would be with this phrasing and how it's going to work with JavaScript:

COSE allows the header to be keyed with strings, but all default fields ( those that every implementation has to understand & that are registered with IANA ) are keyed by integer, where positive AND negative integers are allowed.

Can the header have integers and string representations of integers? Could it have { 1: 'foo', '1': 'bar' }? If that's the case and these header values can have arbitrary keys (can they?) then it could get dicey. If there's not going to be a conflict then maybe you could make the codec have a rule that "in the data model, header keys are strings, including the string form of integers, but when encoded, any key that can be converted from a string to an integer (atoi() style) will be encoded as an integer in CBOR. It's this space of having conflicting string/int keys and the possible necessity of disambiguating between "2" and 2. But I don't know COSE so I don't know how much this matters.

JonasKruckenberg commented 3 years ago

Thanks for the response! I already figured something like that would be the way to go and to address your concern: yes in theory both is allowed,BUT since all keys that can be mapped to integer are well known (there exists an IANA registry) one could simply disallow users to use well known integer keys in string form. Another solution would be to just disallow string keys in the binary representation all together. I'll have to see about that though. Anyway thanks for the response!