flatten models packages in Go to avoid unsupported circular references

papegaaij commented 1 year ago

For larger APIs, it is nice to put different parts of the model in different namespaces. For example, in our API, we've got over 40 types related to authentication and about 30 related to groups of users. In Java, we've put these classes in different subpackages, and I think this works well for other languages, such as C# as well. However, these types do have relations pointing to types in other subpackages, such as accounts being member of a group and the groups an account is member of. These dependencies often form cycles. This currently breaks code generation for Go, as Go does not allow circular dependencies between modules.

I'm not sure what the best way to fix this would be, but I guess the best solution would be to flatten the models modules in Go and generate one large module. I think cross-type relations are quite common in APIs and requiring the provider of the API to not use relations over namespaces does not seem viable to me. IMHO this should be implemented in the GoRefiner.

baywet commented 1 year ago

Thanks for reaching out on the topic.

The go refiner already has a step that removes properties in models that depends on sub-packages to avoid circular dependencies.

With the level of details you've provided, it sounds like your scenario should already be covered as working? Can you maybe provide a mermaid diagram of your scenario?

papegaaij commented 1 year ago

My problem is with properties from sub-packages, referencing other sub-packages. I've created a very simple spec with a.a referencing b.b and vice versa. Removing properties would fix the the compilation issues on Go, but that would require removing all properties that reference sibling-packages. In our case, that would result in an SDK that is not very usable.

I think the only viable route would be to place all model classes in a single package. This would however give rise to some other issues, most notably duplicate type names. Perhaps, a solution would be to rename the types to include the sub-package name in the type name. I believe this is a common practice for other generators.

I can start working on a PR if you agree with this approach.

baywet commented 1 year ago

No I don't think flattening namespaces would be a viable solution, it'd result in conflicts (we have the case in Microsoft Graph).

Indeed the step "looks down" but doesn't look at siblings, or children of siblings. What I'm a bit worried about with a "naïve approach" of projecting properties against every property is that:

it'll be slow.
we'll miss any case with more than 1 "hop" in the loop.
it's not going to be deterministic about which property gets removed (the code dom is not emitted in any specific order)

The better thing to do would be to implement a cycle detection algorithm which probably represents a significant undertaking at this point. Once we have a list of loops, we need to come up with an algorithm that'll consistently "cut" the loop at the same place across iterations or even if the graph changes slightly.

papegaaij commented 1 year ago

Naive flattening would indeed result in conflicts, but a different strategy would be to rename the classes during the flattening. For example, if I take some names from the Graph API:

microsoft.graph.group -> Group
microsoft.graph.security.group -> SecurityGroup

This should prevent most conflicts. Any remaining conflicts could then be solved using a simple postfix with an increasing number (this will probably be very rare).

Removing properties that cause cycles in the dependency-graph is possible, but as you already said, it will be hard to do in a stable way, and even then, you are removing properties defined for a reason in the API. In my simple case with a.a pointing to b.b and vice versa, how would you decide which property to remove and how would the developer using the SDK know this happened and still set or read these properties?

In our API, we've got quite a few of these references to sibling packages and many of them are required properties. We often create a superclass to be a reference to another entity. So you have a.SubA extends a.BaseA and b.SubB extends b.BaseB with a.SubA containing a reference to b.BaseB and b.SubB containing a reference to a.BaseA. Removing even a subset of these references would make the SDK useless for a developer. That would leave me with no other choice than to do a pre-processing step to update the OpenAPI spec for Go only to flatten the names to force Kiota to generate a single module anyway.

baywet commented 1 year ago

In your example, what do you expect to happen if microsoft.graph already contains SecurityGroup?

papegaaij commented 1 year ago

That's the big problem you always face when you are changing names, but I think a simple increasing number at the end should do the trick. So one of the types will be named SecurityGroup and the other one will be SecurityGroup0. This is not ideal, but IMHO far better than properties missing. Also, I think such collisions will not be very common.

baywet commented 1 year ago

Thanks for developing the thought to the end. The other major issue with moving the class is that it represents two breaking changes:

If I used kiota v1.3 to generate a client, a property got trimmed, but I might still have taken a dependency on the model. Now with Kiota v1.x the model moves and my code is broken.
If the description changes to add a property which results in a loop, now my model moves as I refresh the client.

Both of those go against our support policy and would require a major bump in the first case, and wouldn't have a solution in the second case.

Trimming properties, although not ideal, makes for a more stable generation result over time. What do you think?

papegaaij commented 1 year ago

My intention was to always entirely flatten the codespace of the models module. This would be stable with any regeneration of the client and be very predictable with changes to the OpenAPI spec. Golang being a stable language, this does indeed require a major release, which is unfortunate.

Trimming properties does not generate a stable API either. Adding a property somewhere, might cause some other property to be removed. The generator has no way of knowing which property existed in the old API. In my simple a.a + b.b example, if I start with only the reference from b.b to a.a, and add the reference from a.a to b.b later on, the first might get removed.

I'm not an experienced Golang developer, but my search so far seems to suggest that putting your entire model in one package is the preferred way of doing it in Golang. I've found several large SDKs which use this approach.

baywet commented 1 year ago

The current implementation is deterministic in the sense that it removes properties that "point down". But yes, I agree that coming up with something deterministic for a broader loops context over time will be challenging at best.

Here are two discussions I've been part of which are relevant to this issue:

The first one is at the API council at Microsoft while working on the namespacing API pattern guideline. Long story short we ended up recommending that type names across namespaces DO NOT need to be unique, because it'd defeat the purpose of namespacing. And we also recommended for client applications TO NOT flatten namespacing (although that's not part of the published guidance for some reason).

While working on Go generation in the early days, we faced performance issues with the Microsoft Graph Go SDK (generated from Kiota). And I reached out to people working on the Go compiler itself. Their feedback was along the lines of "don't make main packages (root + sub packages) that are so large", "subpackaging is supported, don't nest too deep", "break into multiple main packages if you can". Obviously, that's hard to conciliate with most REST APIs that usually make use of namespacing, have a large number of inter-related models spanning across packages.

I understand the Go community made early design choices to keep things as simple as possible. But my opinion is that it's reached a point where some of those choices are non-practical in a large number of scenarios. (the other example I have in mind being how generics work...). API producers have come to expect the client application will have a number of capabilities based on very basic language capabilities that are available today in the vast majority of programing languages out there.

Now, that additional context while useful doesn't solve your problem. But I don't think a solution where we flatten namespaces would work at this point. If we can come up with a deterministic way to detect loops and clip them deterministically it'd be a huge win. And it'd also go a long way to unblocking our ruby work.

papegaaij commented 1 year ago

I do agree that packages in Golang are very limited, maybe even to such a degree that they become unusable. This is perhaps why so many opt not to use them or only use a very few packages in their applications.

Our OpenAPI spec is based on a Java domain model, where we use packages to group classes in a natural way. This grouping has nothing to do with dependency relationships. A quick count gave about 250 inter-package references. Even if only 1 in 3 need to be removed to break circular references, this would render the entire SDK useless.

I do not see how removing properties actually solves anything. Rather than having an SDK that does not compile, I get an SDK that does compile, but that I cannot use or distribute, because many important properties are missing. Isn't this a problem with the Graph API as well?

baywet commented 1 year ago

In the case of Microsoft Graph, we only have a few properties looking downwards, hence the chosen solution to this point.

The hope was that package limitations would be worked on in the Go compiler and we could eventually remove the method I linked earlier. But reading answers like these lead me to think it's a design choice in the philosophy of the language: only separate things in packages when they are truly independent. An having cross packages dependencies kind of breaks that rule. Engaging with the Go community to understand whether there is any work being done/any pain being reported would help moving this thread forward.

Assuming we have a loop detection algorithm in place. We could deterministically duplicate the types from the other packages instead of trimming properties. This might get messy for large interdependencies as you outlined.

papegaaij commented 1 year ago

Yes, from what I've read so far, packages in Go are meant to separate truly independent parts of code. Most of the advice I've read also mentions a models package (a single package), but I haven't found any really good reading on large domain models.

Maybe nobody writes really large applications in Golang, but if I look at some of the applications we build at our company, these have domain models consisting of thousands of Java types. These types contain many interdependencies, as they are mapped onto a database using Hibernate ORM. Every foreign key constraint is a reference to another type, and the database literally has thousands of them. The RESTful APIs often contain a little less intertype references, because of the way the data is exposed, but many of them remain. I really can't image having to put all those thousands of types in a single package, but I do not see any other way.

Wrt copying the types, that would work in a purely objective point of view, but it would create a truly dramatic user experience for our API. Let's say I want to create my a.a with a reference to b.b. For this, I would normally fetch my b.b and put that in my new a.a instance. However, when the type gets copied, I need an a.b (or something like that), and the endpoint for b.b returns `b.b, which is incompatible. This will require a lot of copying and rebuilding objects.

For now, I'm working on a pre-processor that modifies the OpenAPI spec to rename all schema components to flatten the models space for Golang code generation. This will probably 'fix' the issue for us, but IMHO shouldn't really be needed.

baywet commented 1 year ago

Would it be possible for you to build a OpenAPI.net validation rule that triggers on loops? This way we could at least warn people and give them a clear explanation to why their models might be trimmed/not be able to build. I know this is less than ideal but it might be a nice temporary workaround that lets people act on their description until we either decide to flatten things for Go with a v2 or until Go expands their package support.

papegaaij commented 1 year ago

Wouldn't it be easier and clearer to emit a warning for every property that is dropped from the GoRefiner?

baywet commented 1 year ago

assuming we implement dropping properties for sibling namespaces references (not just upwards), that could work too.

papegaaij commented 1 year ago

I just inspected the generated code for my small a.a and b.b example, and noticed another major issue: polymorphism does not work across packages. The Microsoft Graph API must be suffering from the exact same problem.

This is the example spec I'm using:

openapi: 3.0.1
info:
  title: Example
  description: Example
  version: v1.0
servers:
  - url: https://example.org
paths:
  '/example':
    get:
      summary: Example
      operationId: Example
      responses:
        2XX:
          description: Example
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/entity'

components:
  schemas:
    entity:
      title: entity
      required:
        - '@type'
      type: object
      properties:
        '@type':
          type: string
      discriminator:
        propertyName: '@type'
        mapping:
          'a.a': '#/components/schemas/a.a'
          'b.b': '#/components/schemas/b.b'
    a.a:
      allOf:
        - $ref: '#/components/schemas/entity'
        - title: a
          required:
            - '@type'
          type: object
          properties:
            '@type':
              type: string
              default: 'a.a'
            b:
              $ref: '#/components/schemas/b.b'
    b.b:
      allOf:
        - $ref: '#/components/schemas/entity'
        - title: b
          required:
            - '@type'
          type: object
          properties:
            '@type':
              type: string
              default: 'b.b'
            a:
              $ref: '#/components/schemas/a.a'

Notice that the example operation returns an entity, this can either b an a.a or a b.b. However, in the generated code, entity ends up in the root of the models package and a.a and b.b both in their own sub-package. With the sub-types extending entity, a dependency on the root package is added to both sub-types. The factory method to create an entity from its discriminator lives in entity.go in the root. This file cannot have dependencies on the sub-packages, and these are removed. The result is a factory method that can only create entity and not a.a or b.b:

func CreateEntityFromDiscriminatorValue(parseNode i878a80d2330e89d26896388a3f487eef27b0a0e6c010c493bf80be1452208f91.ParseNode)(i878a80d2330e89d26896388a3f487eef27b0a0e6c010c493bf80be1452208f91.Parsable, error) {
    return NewEntity(), nil
}

The very same issue exists in the Graph API, because most types extend from microsoft.graph.entity, including a lot of types in sub-packages. The factory method for microsoft.graph.entity will never be able to construct these types.

baywet commented 1 year ago

You are correct, this is part of the limitation as well. However we don't have an instance of an endpoint or a property returning entity directly (in the metadata). It's at least returning some intermediate base type, or directly the derived type. And as long as the base type and the derived types don't cross package boundaries, the discriminated downcast works.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

papegaaij commented 1 year ago

I don't think we want this issue to be closed. This is a real issue with the current Go generation, that cannot be fixed before 2.0 and will require an explicit decision between either of these two options:

The models package will be flattened to one large package entirely.
Circular references between packages (either between siblings or from a sub package to one of its ancestors) will be explicitly not supported and all these references will be pruned.

baywet commented 1 year ago

That bot is driving by tags. And we use the tags to "force the conversation to a decision or to be closed" as you rightfully pointed out. Flattening packages is not something we can implement before a v2. Microsoft Graph doesn't really have this issue, you've already built some pre-processing for your case. So now the question becomes "do we have enough demand that requires fixing this issue before v2?" Unless you want to spend the time implementing a deterministic trimming of circular references, my suggestion here is to park this to v2 with the intention of flattening things out. And wait to see if we get additional demand that'd make it a more pressing matter. What do you think?

papegaaij commented 1 year ago

Yes, I agree. I think flattening is the go-way of doing it, but this can't be done before v2. Trimming is a lot of work and hard to impossible to get deterministic. Let's park this for v2.

microsoft / kiota

flatten models packages in Go to avoid unsupported circular references #2834