ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

schema: Enum JSON representation questions #260

Open mikeal opened 4 years ago

mikeal commented 4 years ago

I finally got around to adding support of enums to my schema validator and I’m a little perplexed by the JSON representation.

Here’s the default:

type TestEnum enum {   
  | Yes                                                           
  | No                                                              
}
{ “kind”: “enum”,                          
  “members”: { “Yes”: null, “No”: null }, 
  “representation”: { “string”: {} }
}

Which is reasonable. But this is what it looks like when you use non-default values.

type TestEnum enum {
  | Yes ("yes")
  | No ("no")
}
{ “kind”: “enum”,                              
  “members”: { “Yes”: null, “No”: null },   
  “representation”: { “string”: { Yes: 'yes', No: 'no' } }
}

This is pretty odd. If the members are always null then why make it a map?

It would make the most sense for the members map to be an array or simply put the representation values in the members map instead of creating a duplicate map for the representation.

rvagg commented 4 years ago

This is one for @warpfork to go into some of the philosophy of why things go where. But the basics are that anything inside parens are representation options for the individual elements/fields/members, the parens are a shorthand for what should normally go in the } representation string { ... } block so you don't end up duplicating the members in there (the original did have this duplication iirc). So anything that is strictly about representation only goes into the representation section and the definition section isn't polluted by those details.

rvagg commented 4 years ago

It would make the most sense for the members map to be an array

that's fair, I think there might be a Go bias here for maps? it's certainly easier to look up membership in maps vs arrays

warpfork commented 4 years ago

Yeah, I can't actually defend this map here all that vehemently.

I think I wrote it this way because istm that maps with a single constant value (a unit, essentially) are a recognizable substitute for sets.

And it's easier/faster to check membership in maps, typically, yeah. (Although this is also an O(n)-vs-O(1) thing... where in many cases, the constant factor on the O(1) will be rather huge whilst the n is rather small, so, "faster" should be processed with several grains of salt here.)

A list would arguably be just as common and recognizable as a substitute for a set. It would mean the schema wouldn't be enforcing a uniqueness constraint anymore (whereas a schema that states this encodes as a map does enforce that)... but we could choose to accept that.

I guess I'd be open to changing this to a list if someone has strong feelings about it. My feelings simply aren't strong on this.


The reasoning @rvagg pointed out for keeping separation between {facts about the types} versus {details about the representation} are 100% true though (and those I do still have strong feelings about -- those representations showing up as unions in the code to handle this stuff is elegant/correct and I wouldn't have it any other way; and, in general, it's not always the case that the representation info ends up being a map with member names as a key, even though that's true in this example).

warpfork commented 4 years ago

Another angle we could regard this from might be: Set should be a Kind in the Data Model.

If we took that approach, though, we'd need to define how "set" gets encoded in codecs, and most codecs... don't have sets. We'd probably...

encode it as a map with some kind of unit value, I suspect ;)

Different codecs would be free to make different choices about this, which could be good. But needing to massage it into a codec that doesn't have a built-in set concept would still predominate our concerns in practice, I suspect..

mikeal commented 4 years ago

Languages typically have easier, and faster, conversions from lists to sets than maps to sets, so I’m not sure representing them as a map is better/easier.

I’m -1 on adding Set to the Data Model but I don’t see any reason why we couldn’t add a Schema type for Set. Then we could have both a list and map representation. This doesn’t solve the problem of what we use here for parsed schema for enums but it’s something to think about that is probably useful.