Open jonaslagoni opened 3 years ago
I agree. I don't see a way around this other than a prefix or postfix of some kind.
Or simply generate an error if a reserved word is used. e.g. "Error: in TypeScript, return
is not permissable as a function name; cannot process"
I agree. I don't see a way around this other than a prefix or postfix of some kind.
Right, postfix is also an option.
Or simply generate an error if a reserved word is used. e.g. "Error: in TypeScript,
return
is not permissable as a function name; cannot process"
It is definitely a possibility, I am however really not a fan of such behavior, which will restrict what schemas can be used, it restricts what users may use JSON Schema for. What if it is a requirement, something that can't be changed, that a property is called return
, do we really want to say to them, well, you can't use the library then? 🤔
I don't think I have a problem with not specifying a specific solution to this. Implementations don't all have to do the same thing as long as they don't generate code that doesn't compile. Both throwing an error or fixing it should be ok.
In C#, any reserved keyword can be prefixed with @
to make it usable as a variable or other identifier. I don't think this is generally a problem for C#, though, since the naming convention is PascalCase for public properties and all the keywords are lowercase and single-word.
As I've commented across several issues now, I think the specifics should be left to the implementation, but the spec can say whether it should be handled.
I see three paths forward here. What do you think of these solutions?
'get_return'
as a accessor method for that valueNote: I have done some analysis of invalid property keys in python here: https://github.com/openapi-json-schema-tools/openapi-document-analysis/blob/main/reports/properties_keys_python_report.md ~ 1% of property keys have invalid (in python) identifiers and most of those are invalid variable/method names, not language specific reserved names
Note that this isn't just for OpenAPI. OpenAPI would be just one of multiple consumers of this vocabulary. I'd really prefer not to pull consumer conventions into a general specification.
I'd really prefer not to pull consumer conventions into a general specification
Could number two be a solution in any context not just openapi?
I'm not sure I really like any of those options. The generated type should resemble as close to the schema as possible so that deserialization is simple (automatic would be preferred).
For instance (from the opening example) deserializing
{
"return": "foo"
}
into this C# class
public class SomeTitle
{
public string reservedReturn { get; set; }
}
wouldn't populate reservedReturn
. (In C#, it'd be generated as Return
anyway, which wouldn't have any conflict, but the point is still valid.) To get deserialization to work, it'd also have to generate an attribute that indicates the JSON property name.
public class SomeTitle
{
[JsonProperty("return")]
public string reservedReturn { get; set; }
}
Options 2 or 3 could be done in addition to option 1. That way we always keep the original key info (+ closely resemble the schema) in the payload and additional assessor methods are created to provide type safety in these corner cases. That way deserialization stays automatic and how to access this info is what is adjusted.
I avoid defining a spec and leave it to entities to decide what is more suitable for them.
At some point, you will end up having to be an expert in any existing (and future) programming language that somebody would like to use and probably end up creating a spec that didn't take into consideration their languages. Creating a frustrating situation for everyone.
:2cents:
Remember per this report:
https://github.com/openapi-json-schema-tools/openapi-document-analysis/blob/main/reports/required_keys_python_report.md
that most invalid identifiers are not reserved words, they are ones like xquery-version
.
@gregsdennis I see that you do not like any of the options that I proposed. What are solutions that you would prefer instead that work for these use cases? A custom getter and setter with a different name? What would decide what that alternate name is?
I don't think the vocab can specify this. It'd have to do that for every language, which is impractical.
What the vocab can specify is that the implementation has a well-documented approach.
I also don't agree with the approach that this issue assumes: trying to fit JSON into a language. We need to start with the language, and each language (implementation) would indicate what valid keys look like. It's then up to schema authors to align with the requirements of the languages they anticipate will be generated. I expect that there will be a common subset, e.g alphanumeric+underscore is pretty common.
So I agree that in an ideal world and going forward people should pick keys that work for their language contexts. This is why protobuf constrains keys. But I also think that there will be users who will be unable to change the server implementations and will want this to work on the client side for this use case.
@jonaslagoni can you update this issue to cover invalid key names too not just reserved words?
I think I'd be open to this vocab specifying that implementations:
* Because it's really on payload authors to get this right, implementations can't really do anything other than enforce a "correct" behavior.
All of that said, I think we're coming at this the wrong way. We need to start with the languages to determine what that "minimum compatible character set" is. I want to drive JSON payloads into compatibility rather than attempt to appease any janky key that may come in.
I am sorry for being pessimistic in the conversation. All I ask you is to understand that I am coming from a place of avoiding failures so we do not reset again and create another specification.
I think we shouldn't do anything with the "values" of things from the perspective of the specification, especially if such values are strongly dependent on application-level code.
MUST enforce* a "minimum compatible character set" for properties by default
I strongly agree with you in an ideal scenario: alphanumeric + dashes + underscore and call it a day. But then you will find people using dot
such as "settings.threshold"
so dots will be included. Then other people prefer to use URN so :
as well, and others follow a JSON Point so #/
What about Chinese and other languages? ....
It is impractical. Eventually creating a glorified regex of defining "anything"
If you were talking about the value of title
or (hopefully one day) schema_id
or anything that exists because of the JSON Schema spec itself, that is something else, but these keys and values are application dependent. You would argue you could make them spec dependent, but that is exactly where I will ask you if you are extremely sure about that.
alphanumeric + dashes + underscore
~I wouldn't use dashes.~ As part of JSON Path, I did a quick survey to see what languages support it, and it seems few support dashes.
This also demonstrates that it's not impractical to define such a thing: we did it in JSON Path.
Edit: Actually I think supporting dashes would be fine. I expect that most implementations would be able to adjust from kebab case to whatever is normal for their language fairly easily.
How would you handle the following key?
{
"arn:aws:iam::123456789012:user/Development/product_1234": "<... value ...>"
}
It depends. Is that a property name on a statically defined object, or merely a key in a map/dictionary? If it's a map/dictionary, it's fine: it's a string; anything goes. If it's a statically defined object, there's not enough (too much?) information to make a determination, and by default, I'd reject it.
Also, is that defined in a schema somewhere? We're defining code generation from schemas here (and schema generation from code). What does the schema for that look like?
It could be a static value. I showed cased AWS because using URN to "identify" (URN) things.
This is a common thing I see dealing with IAM and Roles/Groups. They keys are static because the role/groups are not dynamic, so you want to have the specification with the given URN for a given Role/Group:
// static
{
"urn:mysystem:iam-service:role:admin": "allowed",
"urn:mysystem:iam-service:role:user": "denied",
}
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"title": "Permissions",
"properties": {
"urn:mysystem:iam-service:role:admin": {
"type": "string"
},
"urn:mysystem:iam-service:role:user": {
"type": "string"
}
},
"required": [
"urn:mysystem:iam-service:role:admin",
"urn:mysystem:iam-service:role:user"
]
}
GCP prefers to use /roles/accessapproval.viewer
as the role Ids.
As I said before, this is just something I am familiar with, but they are plenty of situations like that one.
I think that design is wrong. It's obvious that you could have different schemas for different sets of permissions. In a strongly typed language, you wouldn't want separate model for every combination of permissions URNs; you'd want a single model that can handle any set of permission URNs. (In a weakly typed language, it doesn't matter because you're not generating types; I'm not sure what you'd be generating.)
I'd opt for a more general Permissions
dictionary model where the keys are those URNs.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "http://aws.com/permissions",
"type": "object",
"title": "Permissions",
"propertyNames": {
"type": "string"
"format": "uri"
},
"additionalProperties": { "type": "string" }
}
Then your schema merely represents a specific subset of that model.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"allOf": [
{ "$ref": "http://aws.com/permissions" }
],
"required": [
"urn:mysystem:iam-service:role:admin",
"urn:mysystem:iam-service:role:user"
]
}
I wouldn't generate a class from your schema; I'd generate the class from the base http://aws.com/permissions
. The better model is general-use.
(Also, I wouldn't use draft 4.)
If you've been given the schema you showed, then I'd skip on generating from that and just manually write the code for the general model.
In many languages there exist reserved, that data models cannot contain, so what do we do when we actually encounter them.
For example in TS, say you want a data model for:
Normally you would probably have expected the output to be:
However,
return
is a reserved keyword for TS which gives a syntax error. So how do we handle such cases?I only see one option here, and that is to prepend something like
reserved
to the property name.This will however make the de/serialization more complex, as they won't match one-to-one.