palantir / conjure

Strongly typed HTTP/JSON APIs for browsers and microservices
https://palantir.github.io/conjure/
Apache License 2.0
421 stars 66 forks source link

Circular package dependencies break certain languages #131

Open jdhenke opened 6 years ago

jdhenke commented 6 years ago

What happened?

I created a valid conjure definition that introduced a cycle of package dependencies:

types:
  definitions:
    objects:
      Type1:
        package: com.palantir.odd
        fields:
          type2: Type2
      Type2:
        package: com.palantir.even
        fields:
          type3: Type3
      Type3:
        package: com.palantir.odd
        fields: {}
services: {}

This is valid according to the spec, and the current conjure compiler successfully create an IR from this, however any straightforward bindings for a language like Go which forbids circular package dependencies would not be able to generate code that compiles for these definitions by using the packages as defined in conjure.

$ go build ./conjure/...
can't load package: import cycle not allowed
package internal-github/jhenke/bug-circular-conjure-imports/conjure/com/palantir/even
    imports internal-github/jhenke/bug-circular-conjure-imports/conjure/com/palantir/odd
    imports internal-github/jhenke/bug-circular-conjure-imports/conjure/com/palantir/even
import cycle not allowed
package internal-github/jhenke/bug-circular-conjure-imports/conjure/com/palantir/even
    imports internal-github/jhenke/bug-circular-conjure-imports/conjure/com/palantir/odd
    imports internal-github/jhenke/bug-circular-conjure-imports/conjure/com/palantir/even

What did you want to happen?

I want there to be a recommendation on how languages which do not support circular dependencies should handle this situation or a change in the conjure spec and compiler to forbid this behavior.

iamdanfox commented 6 years ago

This is a tricky question because for some conjure-generators this is not a problem at all, and for others it's quite tricky to handle. (We've seen something similar in python too.)

I'd prefer not to introduce restrictions to the conjure-language itself to accommodate specific generators. Otherwise over time I think the validation would accumulate all sorts of weird edge cases which would be difficult to justify as this repo doesn't run most generator code.

For this specific case, I think the options are:

jdhenke commented 6 years ago

Thanks for the thoughts! A few replies:

Going back to enforcing no package cycles in the conjure language spec itself, I understand the aversion to overindexing on a specific language’s constraints, however if this affects multiple languages and can’t be trivially solved in the compiler, perhaps this option should be reconsidered. And are package dependency cycles really ever necessary? Best practice? Perhaps this restriction would be beneficial in all cases.

On Mon, Nov 5, 2018 at 03:17 iamdanfox notifications@github.com wrote:

This is a tricky question because for some conjure-generators this is not a problem at all, and for others it's quite tricky to handle. (We've seen something similar in python too.)

I'd prefer not to introduce restrictions to the conjure-language itself to accommodate specific generators. Otherwise over time I think the validation would accumulate all sorts of weird edge cases which would be difficult to justify as this repo doesn't run most generator code.

For this specific case, I think the options are:

  • manually fix the API: the conjure-go generator remains a 'partial function' from IR -> conjure, in that there are some IR definitions which it simply can't handle. Your recourse is to ask the author of the API to adjust their definition so that conjure-go can handle it. (The danger here is that it might regress in the future)
  • change conjure-go to auto-fix this issue: it's pretty normal for the IR to contain definitions that can't be directly translated into a language code, e.g. if you name a conjure field boolean then in Java we'll generate a field called boolean to avoid the reserved name. In this specific case, conjure-go could come up with some kind of stable sorting of all the types and realise that Type2 would introduce a back-reference in your nice DAG to com.palantir.odd, and automatically resolve this to something like com.palantir.odd.
  • change conjure-go to ignore packages completely: the most brutal solution would be to make conjure-go simply ignore the package so that everything goes into one compilation unit. This is not really feasible because it then exposes you to problems if the IR contains multiple identically named types with the same package! See below:
  • Conjure could start enforcing name uniqueness across packages - if all names are unique, then you'd be safe to ignore packages and generate a single compilation unit without clobbering types. (#117 https://github.com/palantir/conjure/issues/117)
  • Automatically run some conjure-go validate command before IR publish - if we introduced a new standard generator command like this, then API authors could get fast feedback about whether they are writing APIs that generators will be able to handle.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/palantir/conjure/issues/131#issuecomment-435839748, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWlwoNbwAyaksgJPuZzcxZTi1HQIwdkks5usB5TgaJpZM4YMMgi .

nmiyake commented 4 years ago

Flagging that we encountered this in another published Conjure API.

Are we pretty confident in our stance of not changing this at the language-level? If so, then I will likely look at implementing a workaround at the generator-level (determine cycles and break out things that cause cycles into different packages with programatically generated names -- probably append a counter to it or something just to force uniqueness). However, this would require a decent amount of work, so want to verify first (especially since I do think the cleanest approach would be to have this check/forbid this at the language-level).

thehgregor commented 4 years ago

Just to give a user's perspective on this:

Yes, I also think an API without mutual dependencies between packages would be cleaner, but for existing APIs the definition will be difficult to change. Maybe, a validator could still warn about this?

As a user, I would mainly just like to have a working solution. But I also like the structure of the generated code, and it makes it a lot easier to understand the API. So I think it would be nice to keep the package structure and only adapt it where this is necessary, a bit in the same way as this is done for reserved words (boolean -> boolean_).

Remains the issue of changing package names. I think this is acceptable as long as the result is predictable and stays the same if the same API version is generated several times. A new API version might always introduce some breaking changes and require some refactoring. The impact should be limited to a minimum (maybe append some hash value instead of a counter), but I also hope that cyclic references won't be too frequent in an average API.

So my approach would be what you propose: have a language-specific workaround for this on generator level when a cyclic dependency is detected. In addition it would be nice to have a validation warning to tell the owner of the API that his design is not ideal and might cause issues for certain languages.

nmiyake commented 1 year ago

@iamdanfox @carterkozak would like to revive discussion about this and tagging you folks as core Conjure members (if I'm missing others who should be looped in let me know/feel free to add).

This has resurfaced again, and while in the current instance it looks we'll be able to work around it again by coordinating with the creator of the definition and agreeing to eat a compile (but not wire) break to resolve it, the underlying issue remains unresolved/open. I wanted to get feedback on the following proposal:

Modify Conjure spec to forbid cyclical references (but expose/allow configurable override for backwards compatibility)

I propose that we modify the Conjure specification to forbid definitions that would result in a circular package reference. Although the Conjure specification is meant to be language-agnostic and does not explicitly dictate how the package name type name must be used in relation to the generated code, the package field is required for all types (either explicitly or via a default-package value), and name uniqueness is required within a package.

Based on the above, we specify/define the following:

Currently, Conjure imposes no requirements on package cycles. The proposal is to make it such that a Conjure definition that produces a package reference cycle will be deemed invalid. Note that this definition is fully definable/enforceable at the Conjure definition level and does not make any assumptions about the generated code or language.

Motivation

The motivation for this proposal is that it makes code generation easier for languages in which package import cycles are not permitted (for example, Go) while not imposing too much of a restriction on definition authors (although this should be validated against existing definitions). Having authors avoid or modify definitions to prevent cycles is likely orders of magnitude less work than allowing cycles but having to support generation for them in languages where this is difficult.

Most current instances in which we have observed cycles are largely incidental -- typically, it's due to a package containing the definition for both an alias type and a complex object, and the complex object references another package that has a different object that references the alias type again. In instances such as this, it's typically possible to avoid cycles by separating the aliases/basic objects from the complex ones.

If this restriction is imposed, there are certain definitions that would no longer be expressable in Conjure -- specifically, currently it is possible to have 2 objects in different packages that reference each other in their fields, but with this modification such a scenario would no longer be possible (any types that mutually reference each other must be in the same package). However, because Conjure does not have any notion of package-level visibility/access, moving the definition should not be a big deal.

Alternatives

If we don't want to impose this requirement on the spec, one possible alternative would be allow this to be a generator or IR option -- IR generators could adopt a --no-cycles option or something similar to that which detects cycles and warns about them or blocks the IR generation if the option is provided. This would allow for the requirement to be checked for/imposed separately from the spec. However, from a tactical perspective of deploying the change, this wouldn't change much, since if an organization is dealing with any generators that don't support cycles, this option/CI check/etc. would likely only be valuable if it were widely deployed and became enforced/the default.

Proposed rollout

If we adopt this proposal, I would propose a rollout like the following:

Counter-arguments

  1. Many languages (such as Java) allow cyclical package references, so this prohibition should not be dictated by the language-agnostic Contour. Furthermore, although Conjure defines a notion of packages and type names and has the requirement that a type name be unique within a package namespace, it has no opinion on how this input is translated to generated code, so it should be the responsibility of the generators to work around any issues (including doing things like generating different Conjure packages into the same language package/namespace if necessary).

The above is factually true, but I would argue that, given that there are languages where cycles are problematic and that the Conjure definitions adopt a notion of "namespace/package + name", it is not inconsistent to define and impose a non-cycle requirement in the Conjure definition, especially if it will simplify some classes of generated code.

carterkozak commented 1 year ago
  1. In isolation (ignoring any language support), are cyclical package references something that should be removed from conjure? This is tricky to answer, at face value I don't believe so, especially given the ways that some projects use packages grouping like types as opposed to grouping by service. That's not to say we must root our decision in the answer to this question -- some conjure features e.g. safelong are built based on specific language constraints -- but the trade-offs are important.
  2. What are the impacts to our existing users, both within Palantir and externally? We cannot move forward with any breaking proposal without understanding how much friction it will cause for our users.
  3. The change you describe is a large API break of the sort we have never made to Conjure. I would much rather enforce uniqueness (by setting a single package-name on a per-project basis) to simplify the layout of new projects. Other options include teaching generators to rename classes into a flat package deduplicating simple names. Both of these are compile breaks for impacted languages, however that would vastly reduce the impact surface of any sort of change.
iamdanfox commented 1 year ago

+1 to the idea of saying all types should be unique (ignoring package names) in conjure... then if the golang generator needed to, it could just emit everything to one package?

nmiyake commented 1 year ago

@iamdanfox I think that enforcing unique package names across all of a definition would be a much larger break -- the only real upside I see is that it would somewhat simplify the implementation for the check in the YAML -> IR step, but I don't think that's a huge deal.

And yes, at the extreme, the cycle issue can be worked around by always generating everything into a single package -- since "package + name" is guaranteed to be unique, if we generate everything into a single package using the information, we are guaranteed uniqueness without cycles: ComPalantirApolloDeploymentApiV2Objects_DeclaredConfigOverrides etc. The only true downside here is legibility, although that's a non-zero cost as well (although we could deal with this with heuristics on the generator side like only falling back to single-package output if a cycle exists, doing single-package output but using typename only by default UNLESS there are name conflicts, generating into a single package but also generating type aliases in the package structure, etc.).

In talking through this more I think we can probably live with dealing with this on the generator side... However, in that case, on the Conjure side it would be helpful to codify the current assumption that the value for "package" can only be used for disambiguation -- if, at any point in the future, semantics were added that modified the behavior of objects based on packages (for example, notions of access/visibility like "private" where only types in the same package would be visible), then this would break the assumptions described here.

Just to clarify, if we were to implement this at the extreme, the result would be:

  1. For existing definitions that have a reference cycle, after updating the generator, the IR would no longer be generated
  2. The fix would require manually changing the definition, typically by changing the package structure of the declared types
  3. Although it is generator-dependent, in most cases consumers that use the new definition would get a compile break for generated code, but not a wire break

A "softer" version would be to not change the language/IR specification (so cycles are still allowed), but to update the implementation of our own generators/IR compilers/CI to forbid/block definitions that result in an IR with cycles (even though it is technically allowed by the language standard -- this would be more analogous to having a linting rule that blocks bad practices, even though the language itself allows it).

I've written out all of the above for reference, but feel fairly comfortable with moving forward with the generator change... I'll document my current proposal in the next/separate comment.

carterkozak commented 1 year ago

That sounds like the most reasonable path forward to avoid causing friction for our users.

Expanding a bit below in response to your first sentence for some transparency into the way we think about changes to conjure:

the only real upside I see is that it would somewhat simplify the implementation for the check in the YAML -> IR step, but I don't think that's a huge deal.

The primary upside isn't so much the implementation -- I'm not worried about implementation complexity here -- but rather the complexity that conjure exposes to users. Currently, the package component is simply a prefix to the name field, where disallowing cycles would force developers understand more complex semantics. Conjure is meant to allow folks to quickly and safely build APIs without worrying about details of the rpc system, so there's tremendous value in keeping it as simple as possible. I think this aligns with the removal of the package concept from our definition language, however as you describe, that would be a frictionful change, and lower priority than our ongoing work.

nmiyake commented 1 year ago

Updated proposal: clarify/codify that the "package" value exists only for namespacing/disambiguation, and that the Conjure must not use the package to distinguish any semantic meaning for things like access control

Currently, all Conjure types must have a "package" and "name", and the "(package, name)" pair is treated as the fully qualified identifier and must be unique within a single Conjure IR (including an IR generated from multiple Conjure definitions).

The Conjure language requires that the "package" field must not be used by Conjure in any way that modifies the semantic behavior -- for example, the "package" value cannot be used to enforce any level of visibility/access control (such as private types that are only visible/accessible by other types in the same package).

As a practical consequence of the above, a Conjure definition that replaces all "package" values with the same value and changes the "name" value to be some transformed form of "package + name" should always be semantically equivalent.

Taken above, this provides a way for language-specific generators to avoid issues with import cycles -- at the extreme, a generator can always generate a unique identifier based on (package, name) and generate the result into a single namespace (package, root context, etc.), which would avoid cycles while ensuring that there are no semantic breaks.