Define how to handle reserved keywords

jonaslagoni commented 3 years ago

In many languages there exist reserved, that data models cannot contain, so what do we do when we actually encounter them.

For example in TS, say you want a data model for:

{
  "$schema": "https://json-schema.org/draft/2020-12/idl-schema",
  "name": "SomeTitle",
  "type": "object",
  "properties": {
    "return": {
      "type": "string"
    }
  }
}

Normally you would probably have expected the output to be:

class SomeTitle {
   return: string;
}

However, return is a reserved keyword for TS which gives a syntax error. So how do we handle such cases?

I only see one option here, and that is to prepend something like reserved to the property name.

class SomeTitle {
   reservedReturn: string;
}

This will however make the de/serialization more complex, as they won't match one-to-one.

jdesrosiers commented 3 years ago

I agree. I don't see a way around this other than a prefix or postfix of some kind.

karenetheridge commented 3 years ago

Or simply generate an error if a reserved word is used. e.g. "Error: in TypeScript, return is not permissable as a function name; cannot process"

jonaslagoni commented 3 years ago

I agree. I don't see a way around this other than a prefix or postfix of some kind.

Right, postfix is also an option.

Or simply generate an error if a reserved word is used. e.g. "Error: in TypeScript, return is not permissable as a function name; cannot process"

It is definitely a possibility, I am however really not a fan of such behavior, which will restrict what schemas can be used, it restricts what users may use JSON Schema for. What if it is a requirement, something that can't be changed, that a property is called return, do we really want to say to them, well, you can't use the library then? 🤔

jdesrosiers commented 3 years ago

I don't think I have a problem with not specifying a specific solution to this. Implementations don't all have to do the same thing as long as they don't generate code that doesn't compile. Both throwing an error or fixing it should be ok.

gregsdennis commented 2 years ago

In C#, any reserved keyword can be prefixed with @ to make it usable as a variable or other identifier. I don't think this is generally a problem for C#, though, since the naming convention is PascalCase for public properties and all the keywords are lowercase and single-word.

As I've commented across several issues now, I think the specifics should be left to the implementation, but the spec can say whether it should be handled.

spacether commented 1 year ago

I see three paths forward here. What do you think of these solutions?

make the generated class subclass a mapping, then instance.get("return") will work
make the openapi document writer provide x-property-name property schema info before generation, where one could pass in 'get_return' as a accessor method for that value
use base36 (alphanumeric encoding) or something like it to store encodeded values for all keys and then use those names as method accessor names for those keys.

Note: I have done some analysis of invalid property keys in python here: https://github.com/openapi-json-schema-tools/openapi-document-analysis/blob/main/reports/properties_keys_python_report.md ~ 1% of property keys have invalid (in python) identifiers and most of those are invalid variable/method names, not language specific reserved names

gregsdennis commented 1 year ago

Note that this isn't just for OpenAPI. OpenAPI would be just one of multiple consumers of this vocabulary. I'd really prefer not to pull consumer conventions into a general specification.

spacether commented 1 year ago

I'd really prefer not to pull consumer conventions into a general specification

Could number two be a solution in any context not just openapi?

gregsdennis commented 1 year ago

I'm not sure I really like any of those options. The generated type should resemble as close to the schema as possible so that deserialization is simple (automatic would be preferred).

For instance (from the opening example) deserializing

{
  "return": "foo"
}

into this C# class

public class SomeTitle
{
    public string reservedReturn { get; set; }
}

wouldn't populate reservedReturn. (In C#, it'd be generated as Return anyway, which wouldn't have any conflict, but the point is still valid.) To get deserialization to work, it'd also have to generate an attribute that indicates the JSON property name.

public class SomeTitle
{
    [JsonProperty("return")]
    public string reservedReturn { get; set; }
}

spacether commented 1 year ago

Options 2 or 3 could be done in addition to option 1. That way we always keep the original key info (+ closely resemble the schema) in the payload and additional assessor methods are created to provide type safety in these corner cases. That way deserialization stays automatic and how to access this info is what is adjusted.

yordis commented 1 year ago

I avoid defining a spec and leave it to entities to decide what is more suitable for them.

At some point, you will end up having to be an expert in any existing (and future) programming language that somebody would like to use and probably end up creating a spec that didn't take into consideration their languages. Creating a frustrating situation for everyone.

:2cents:

spacether commented 1 year ago

Remember per this report: https://github.com/openapi-json-schema-tools/openapi-document-analysis/blob/main/reports/required_keys_python_report.md that most invalid identifiers are not reserved words, they are ones like xquery-version. @gregsdennis I see that you do not like any of the options that I proposed. What are solutions that you would prefer instead that work for these use cases? A custom getter and setter with a different name? What would decide what that alternate name is?

gregsdennis commented 1 year ago

I don't think the vocab can specify this. It'd have to do that for every language, which is impractical.

What the vocab can specify is that the implementation has a well-documented approach.

gregsdennis commented 1 year ago

I also don't agree with the approach that this issue assumes: trying to fit JSON into a language. We need to start with the language, and each language (implementation) would indicate what valid keys look like. It's then up to schema authors to align with the requirements of the languages they anticipate will be generated. I expect that there will be a common subset, e.g alphanumeric+underscore is pretty common.

spacether commented 1 year ago

So I agree that in an ideal world and going forward people should pick keys that work for their language contexts. This is why protobuf constrains keys. But I also think that there will be users who will be unable to change the server implementations and will want this to work on the client side for this use case.

@jonaslagoni can you update this issue to cover invalid key names too not just reserved words?

gregsdennis commented 1 year ago

I think I'd be open to this vocab specifying that implementations:

MUST enforce* a "minimum compatible character set" for properties by default
MAY provide alternative support, including a mechanism to allow users to define such support, via an opt-in configuration setting

* Because it's really on payload authors to get this right, implementations can't really do anything other than enforce a "correct" behavior.

All of that said, I think we're coming at this the wrong way. We need to start with the languages to determine what that "minimum compatible character set" is. I want to drive JSON payloads into compatibility rather than attempt to appease any janky key that may come in.

yordis commented 1 year ago

I am sorry for being pessimistic in the conversation. All I ask you is to understand that I am coming from a place of avoiding failures so we do not reset again and create another specification.

I think we shouldn't do anything with the "values" of things from the perspective of the specification, especially if such values are strongly dependent on application-level code.

MUST enforce* a "minimum compatible character set" for properties by default

I strongly agree with you in an ideal scenario: alphanumeric + dashes + underscore and call it a day. But then you will find people using dot such as "settings.threshold" so dots will be included. Then other people prefer to use URN so : as well, and others follow a JSON Point so #/ What about Chinese and other languages? ....

It is impractical. Eventually creating a glorified regex of defining "anything"

If you were talking about the value of title or (hopefully one day) schema_id or anything that exists because of the JSON Schema spec itself, that is something else, but these keys and values are application dependent. You would argue you could make them spec dependent, but that is exactly where I will ask you if you are extremely sure about that.

gregsdennis commented 1 year ago

alphanumeric + dashes + underscore

~I wouldn't use dashes.~ As part of JSON Path, I did a quick survey to see what languages support it, and it seems few support dashes.

This also demonstrates that it's not impractical to define such a thing: we did it in JSON Path.

Edit: Actually I think supporting dashes would be fine. I expect that most implementations would be able to adjust from kebab case to whatever is normal for their language fairly easily.

yordis commented 1 year ago

How would you handle the following key?

{
  "arn:aws:iam::123456789012:user/Development/product_1234": "<... value ...>"
}

gregsdennis commented 1 year ago

It depends. Is that a property name on a statically defined object, or merely a key in a map/dictionary? If it's a map/dictionary, it's fine: it's a string; anything goes. If it's a statically defined object, there's not enough (too much?) information to make a determination, and by default, I'd reject it.

gregsdennis commented 1 year ago

Also, is that defined in a schema somewhere? We're defining code generation from schemas here (and schema generation from code). What does the schema for that look like?

yordis commented 1 year ago

It could be a static value. I showed cased AWS because using URN to "identify" (URN) things.

This is a common thing I see dealing with IAM and Roles/Groups. They keys are static because the role/groups are not dynamic, so you want to have the specification with the given URN for a given Role/Group:

// static
{
 "urn:mysystem:iam-service:role:admin": "allowed",
 "urn:mysystem:iam-service:role:user": "denied",
}

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "title": "Permissions",
  "properties": {
    "urn:mysystem:iam-service:role:admin": {
      "type": "string"
    },
    "urn:mysystem:iam-service:role:user": {
      "type": "string"
    }
  },
  "required": [
    "urn:mysystem:iam-service:role:admin",
    "urn:mysystem:iam-service:role:user"
  ]
}

GCP prefers to use /roles/accessapproval.viewer as the role Ids.

As I said before, this is just something I am familiar with, but they are plenty of situations like that one.

gregsdennis commented 1 year ago

I think that design is wrong. It's obvious that you could have different schemas for different sets of permissions. In a strongly typed language, you wouldn't want separate model for every combination of permissions URNs; you'd want a single model that can handle any set of permission URNs. (In a weakly typed language, it doesn't matter because you're not generating types; I'm not sure what you'd be generating.)

I'd opt for a more general Permissions dictionary model where the keys are those URNs.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "http://aws.com/permissions",
  "type": "object",
  "title": "Permissions",
  "propertyNames": {
    "type": "string"
    "format": "uri"
  },
  "additionalProperties": { "type": "string" }
}

Then your schema merely represents a specific subset of that model.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "allOf": [
    { "$ref": "http://aws.com/permissions" }
  ],
  "required": [
    "urn:mysystem:iam-service:role:admin",
    "urn:mysystem:iam-service:role:user"
  ]
}

I wouldn't generate a class from your schema; I'd generate the class from the base http://aws.com/permissions. The better model is general-use.

(Also, I wouldn't use draft 4.)

If you've been given the schema you showed, then I'd skip on generating from that and just manually write the code for the general model.

json-schema-org / vocab-idl

Define how to handle reserved keywords #18