Polymorphism - Githubissues

yaronf commented 3 years ago

The protocol currently uses polymorphism extensively, and there is an ongoing discussion about the costs/benefits, especially as they apply to strongly typed languages (Go is a particularly tough case).

The cost/benefit calculation may be different for client-side vs. AS-side code, where presumably client-side code should be easier to develop.

Mailing list thread: https://mailarchive.ietf.org/arch/msg/txauth/zWcr_GxGZ3etGPQC85dgqXq7Li8/

fimbault commented 3 years ago

I've started to look at a few other projects to check how they handle this kind of issue.

STIX2 handles a complex domain (exchange cyber threat intelligence). Their use case even allows graphs and patterns! In addition to checking conformance with the JSON schemas, their validator checks conformance with requirements that cannot be specified in JSON schema, as well as with established "best practices." This validator is non-normative; in cases of conflict with the STIX 2.1 specification, the specification takes precedence. Their official implementation is in python, including a parser and a list of tests. In their context, python makes sense because the main use case relates to information processing (big data). But you also find libraries in other languages as well, for instance in go, but their CDDL definitions have not been implemented.
they even created certifications to ensure interoperability and conformance to the spec

Lessons:

polymorphic json seems a viable option, as long as you provide sufficient (informative) guidance/tooling for validation
it may be interesting to include a versioning for our json, in case the spec needs to evolve

Difference with what we're doing: we'll need to make sure it is really simple for the client side libraries, while the server side implementations might optionally want to check against predefined tests cases.

davidgtonge commented 3 years ago

I think that polymorphism is useful for this protocol but should be used only when required.

Branching based on whether a value is a string or an object is one thing. Branching based on whether a value is a boolean or an integer is more problematic (especially in JavaScript where a lot of code will check if something is "truthy" rather than be the boolean true.

fimbault commented 3 years ago

Branching in the protocol is mostly made based on string or object values. But there are indeed a few places where flags may be used, for instance (on top of my head):

Signaling Token Behavior #39 : I would personally encourage a simpler model as I don't really see much value in multi-tokens, but please step in if that's the case (I know Dick had one)
Interaction bundles #59 potentially involves some alternative multi-paths (and there's probably more similar issues that will popup over time)

We should try to limit those multi variants to get a simpler request/response scheme, but the downside is that it also limits the flexibility around those corner cases.

jricher commented 3 years ago

I agree that the branches and behavior should always be clear based on the type. Designing around idiomatic truthiness can be problematic, as it's far from consistent across different languages -- empty objects, empty strings, and empty arrays all come to mind as things that aren't always handled how you'd expect. I think a full survey of types variety used in the protocol today would be helpful, and the editors could facilitate that by marking all fields with possible types in the descriptions.

fimbault commented 3 years ago

Update following IETF109. The following questions were asked (beyond our current evaluation on how to best implement polyphormism).

What would we do instead? (my comment: no obvious alternative) – Mutual exclusivity of fields (my comment: not easy to implement) – Type fields for different objects (my comment: not easy to manage) – Container objects for single values (my comment: yes possible)

What’s the complexity cost for the protocol? – Who pays the complexity?

jricher commented 3 years ago

PR #124 enumerates which types are in use for each field in the protocol structure.

aaronpk commented 3 years ago

That PR #124 definitely helps clear things up.

I think we could use some guidance on the style of polymorphism used in this spec however. There's already an appendix on polymorphism, but I found myself wanting a style guide to clarify things like:

is it okay to mix strings and objects as elements in an array
if an array contains only one value can the array be replaced by the single value

fimbault commented 3 years ago

references

There are very distinct cases that relate to references, and which highlight the polymorphic issues that we have to deal with.

user and client reference We can pass by reference instead of passing by value:
- the client request in 2.3 can either be via an instance_id or a direct string
- the user request in 2.4 can either be via a sub_id (opaque) or a direct string (then in section 3.5, the response can include a "user_handle" and a "instance_id")

Is it worth putting optionality in the requests? It depends what we want:

if it's just a shortcut, it's questionable.
if it's used to signal that we restrict to an immutable reference, maybe. A string reference can't include updated parameters (since there's no struct), and so it actually becomes easier for the server to process downstream.

key reference Section 7.1.1 (key) : "The means of dereferencing this value are out of scope for this specification" means it's based on some out of band mechanism. Again that's a string vs struct case.
interact ref Here it's used for the continuation flow, so it's just a parameter of the continue method.
resource ref Section 8.1 (resource) : in order to use predefined scopes, but the data here can be unstructured (ex : dolphin metadata).
grant ref Section 2.7 (existing_grant) : here it's just a parameter meant as an immutable reference.

implementation

There are many different ways to do that, more or less generically.

When we know what to expect (ex string vs struct), it's possible to test the type of variant
When we don't know what to expect, recursing over the json tree is possible but handling the complexity of all combinations is O(n!)... When language allows, a generic implementation may be based on pattern matching (requires instanciation) or a visitor pattern (more stream based).

tests

in go: for the most complex cases, one can use map[string]interface{} but: No type checking. If the client sends a int instead of a string for the key, then unmarshalling into a map won’t catch the error. Maps are vague and verbose. After unmarshalling the data, we are forced to use runtime checks to make sure the data we care about exists. If those checks aren’t thorough, it can lead to a nil pointer dereference panic being thrown.
in rust : the language supports enums (can use a tag to help the deserializer distinguish between types). But it depends on the implementation (ex miniserde "only simple enum variants without fields are supported"). Here's an example of visitor pattern to handle both a reference and an opaque sub_id value, and map to the same output structure (all type checked):

Reference: { user: Subject { id: "123456789", format: None } } Value: { user: Subject { id: "123456789", format: Some("opaque") } }

Simplified code to give the intuition (in practice the deserialize is more complex, you have an array of enums in between):

fn main() {

    let user_string = r#"
        { "user": "123456789"}
    "#;
    let user: User = serde_json::from_str(user_string).unwrap();
    println!("{:?}", user);

    let user_struct = r#"
    {
        "user": {
            "format": "opaque",
            "id": "123456789"
        }  
    }

    "#;
    let user: User = serde_json::from_str(user_struct).unwrap();
    println!("{:?}", user);
}

#[derive(Debug, Deserialize)]
struct User {
    #[serde(deserialize_with = "visitor")]
    user: Subject,
}

#[derive(Debug, Deserialize)]
struct Subject {
    id: String,
    format: Option<String>
}

adeinega commented 3 years ago

@fimbault, regarding

in go: for the most complex cases, one can use map[string]interface{} but: No type checking. If the client sends a int instead of a string for the key, then unmarshalling into a map won’t catch the error. Maps are vague and verbose. After unmarshalling the data, we are forced to use runtime checks to make sure the data we care about exists. If those checks aren’t thorough, it can lead to a nil pointer dereference panic being thrown.

You may use your own (custom) unmarshal function to do all required checks for the type of a value (string vs interface like in your user_string and user_struct). Furthermore, take care of the rest of things like https://github.com/golang/go/issues/14750.

jricher commented 3 years ago

My own Java implementation uses custom serializer/deserializer classes with a wrapper interface for any field that can take different forms. The two main categories we have are "string-or-object" and "array-or-object", and both of those are pretty easy to detect and handle.

https://github.com/bspk/oauth.xyz-java/tree/master/lib/src/main/java/io/bspk/oauth/xyz/json

fimbault commented 3 years ago

In short yes, that solves the issue.

yaronf commented 3 years ago

Furthermore, take care of the rest of things like golang/go#14750.

Thank you @adeinega, I wasn't aware of this Go issue. It's really bad, and quite surprising to see in the Go library.

fimbault commented 3 years ago

@jricher actually there's a bit more to do to match against the various formats in sub_ids (within the SubjectRequest), instead of a basic string (it would be easy if they all had the same content, but that's not the case).

That gives me something like this in my rust implementation:

// here we use a visitor pattern to map the string as an opaque struct
let byref = r#"
        { "user": "u123456789"}
    "#;

Output: { user: Subject { sub_ids: [Opaque(Format { id: "u123456789" })] } }

// here we handle the array, the content depends on the format field
let byval = r#"
    {
        "user": {
            "sub_ids": [ 
                {
                    "format": "opaque",
                    "id": "u123456789"
                },
                {
                    "format": "email",
                    "email": "u123456789@domain.com"
                }
            ]
        }
    }
    "#;

Output: { user: Subject { sub_ids: [Opaque(Format { id: "u123456789" }), Email(Format { email: "u123456789@domain.com" })] } }

fimbault commented 3 years ago

We're now explicit on types (object, string, array) so I think we can close this issue now

ietf-wg-gnap / gnap-core-protocol

Polymorphism #6

references

implementation

tests