Closed yaronf closed 3 years ago
I've started to look at a few other projects to check how they handle this kind of issue.
Lessons:
Difference with what we're doing: we'll need to make sure it is really simple for the client side libraries, while the server side implementations might optionally want to check against predefined tests cases.
I think that polymorphism is useful for this protocol but should be used only when required.
Branching based on whether a value is a string or an object is one thing. Branching based on whether a value is a boolean or an integer is more problematic (especially in JavaScript where a lot of code will check if something is "truthy" rather than be the boolean true
.
Branching in the protocol is mostly made based on string or object values. But there are indeed a few places where flags may be used, for instance (on top of my head):
We should try to limit those multi variants to get a simpler request/response scheme, but the downside is that it also limits the flexibility around those corner cases.
I agree that the branches and behavior should always be clear based on the type. Designing around idiomatic truthiness can be problematic, as it's far from consistent across different languages -- empty objects, empty strings, and empty arrays all come to mind as things that aren't always handled how you'd expect. I think a full survey of types variety used in the protocol today would be helpful, and the editors could facilitate that by marking all fields with possible types in the descriptions.
Update following IETF109. The following questions were asked (beyond our current evaluation on how to best implement polyphormism).
What would we do instead? (my comment: no obvious alternative) – Mutual exclusivity of fields (my comment: not easy to implement) – Type fields for different objects (my comment: not easy to manage) – Container objects for single values (my comment: yes possible)
What’s the complexity cost for the protocol? – Who pays the complexity?
PR #124 enumerates which types are in use for each field in the protocol structure.
That PR #124 definitely helps clear things up.
I think we could use some guidance on the style of polymorphism used in this spec however. There's already an appendix on polymorphism, but I found myself wanting a style guide to clarify things like:
There are very distinct cases that relate to references, and which highlight the polymorphic issues that we have to deal with.
Is it worth putting optionality in the requests? It depends what we want:
key reference Section 7.1.1 (key) : "The means of dereferencing this value are out of scope for this specification" means it's based on some out of band mechanism. Again that's a string vs struct case.
interact ref Here it's used for the continuation flow, so it's just a parameter of the continue method.
resource ref Section 8.1 (resource) : in order to use predefined scopes, but the data here can be unstructured (ex : dolphin metadata).
grant ref Section 2.7 (existing_grant) : here it's just a parameter meant as an immutable reference.
There are many different ways to do that, more or less generically.
in go: for the most complex cases, one can use map[string]interface{} but: No type checking. If the client sends a int instead of a string for the key, then unmarshalling into a map won’t catch the error. Maps are vague and verbose. After unmarshalling the data, we are forced to use runtime checks to make sure the data we care about exists. If those checks aren’t thorough, it can lead to a nil pointer dereference panic being thrown.
in rust : the language supports enums (can use a tag to help the deserializer distinguish between types). But it depends on the implementation (ex miniserde "only simple enum variants without fields are supported"). Here's an example of visitor pattern to handle both a reference and an opaque sub_id value, and map to the same output structure (all type checked):
Reference: { user: Subject { id: "123456789", format: None } } Value: { user: Subject { id: "123456789", format: Some("opaque") } }
Simplified code to give the intuition (in practice the deserialize is more complex, you have an array of enums in between):
fn main() {
let user_string = r#"
{ "user": "123456789"}
"#;
let user: User = serde_json::from_str(user_string).unwrap();
println!("{:?}", user);
let user_struct = r#"
{
"user": {
"format": "opaque",
"id": "123456789"
}
}
"#;
let user: User = serde_json::from_str(user_struct).unwrap();
println!("{:?}", user);
}
#[derive(Debug, Deserialize)]
struct User {
#[serde(deserialize_with = "visitor")]
user: Subject,
}
#[derive(Debug, Deserialize)]
struct Subject {
id: String,
format: Option<String>
}
@fimbault, regarding
in go: for the most complex cases, one can use map[string]interface{} but: No type checking. If the client sends a int instead of a string for the key, then unmarshalling into a map won’t catch the error. Maps are vague and verbose. After unmarshalling the data, we are forced to use runtime checks to make sure the data we care about exists. If those checks aren’t thorough, it can lead to a nil pointer dereference panic being thrown.
You may use your own (custom) unmarshal function to do all required checks for the type of a value (string
vs interface
like in your user_string
and user_struct
). Furthermore, take care of the rest of things like https://github.com/golang/go/issues/14750.
My own Java implementation uses custom serializer/deserializer classes with a wrapper interface for any field that can take different forms. The two main categories we have are "string-or-object" and "array-or-object", and both of those are pretty easy to detect and handle.
https://github.com/bspk/oauth.xyz-java/tree/master/lib/src/main/java/io/bspk/oauth/xyz/json
In short yes, that solves the issue.
Furthermore, take care of the rest of things like golang/go#14750.
Thank you @adeinega, I wasn't aware of this Go issue. It's really bad, and quite surprising to see in the Go library.
@jricher actually there's a bit more to do to match against the various formats in sub_ids (within the SubjectRequest), instead of a basic string (it would be easy if they all had the same content, but that's not the case).
That gives me something like this in my rust implementation:
// here we use a visitor pattern to map the string as an opaque struct
let byref = r#"
{ "user": "u123456789"}
"#;
Output: { user: Subject { sub_ids: [Opaque(Format { id: "u123456789" })] } }
// here we handle the array, the content depends on the format field
let byval = r#"
{
"user": {
"sub_ids": [
{
"format": "opaque",
"id": "u123456789"
},
{
"format": "email",
"email": "u123456789@domain.com"
}
]
}
}
"#;
Output: { user: Subject { sub_ids: [Opaque(Format { id: "u123456789" }), Email(Format { email: "u123456789@domain.com" })] } }
We're now explicit on types (object, string, array) so I think we can close this issue now
The protocol currently uses polymorphism extensively, and there is an ongoing discussion about the costs/benefits, especially as they apply to strongly typed languages (Go is a particularly tough case).
The cost/benefit calculation may be different for client-side vs. AS-side code, where presumably client-side code should be easier to develop.
Mailing list thread: https://mailarchive.ietf.org/arch/msg/txauth/zWcr_GxGZ3etGPQC85dgqXq7Li8/