palantir / conjure-go

Conjure generator for Go
Apache License 2.0
7 stars 15 forks source link

Generated code does not compile if Conjure definition references type in a package where last element is of the form "v[0-9]+" #405

Open nmiyake opened 1 year ago

nmiyake commented 1 year ago

What happened?

When I generated Conjure output files for a definition that defined a type in a package that ended in "v2" (com.palantir.test.v2) and an object in a different package that referenced that type, the generated code did not initially compile.

Running the generator again (after the initial non-compiling code was generated) added the proper import statement, after which the code compiled as expected.

The initial behavior is incorrect (running the generator for a valid definition should create valid code), and the fact that running the same Conjure generation operation again generates different code that fixes the issue is unintuitive (the generator is expected to be idempotent).


Example Conjure definition:

types:
  definitions:
    objects:
      ObjectOne:
        package: com.palantir.test.common
        fields:
          objectTwo:
            type: ObjectTwo
      ObjectTwo:
        package: com.palantir.test.v2
        fields:
          name:
            type: string

IR for the definition:

{
  "version" : 1,
  "errors" : [ ],
  "types" : [ {
    "type" : "object",
    "object" : {
      "typeName" : {
        "name" : "ObjectOne",
        "package" : "com.palantir.test.common"
      },
      "fields" : [ {
        "fieldName" : "objectTwo",
        "type" : {
          "type" : "reference",
          "reference" : {
            "name" : "ObjectTwo",
            "package" : "com.palantir.test.v2"
          }
        }
      } ]
    }
  }, {
    "type" : "object",
    "object" : {
      "typeName" : {
        "name" : "ObjectTwo",
        "package" : "com.palantir.test.v2"
      },
      "fields" : [ {
        "fieldName" : "name",
        "type" : {
          "type" : "primitive",
          "primitive" : "STRING"
        }
      } ]
    }
  } ],
  "services" : [ ],
  "extensions" : {}
}

Initial lines of output of first run (at common/structs.conjure.go):

// This file was generated by Conjure and should not be manually edited.

package common

import (
        "github.com/palantir/pkg/safejson"
        "github.com/palantir/pkg/safeyaml"
)

type ObjectOne struct {
        ObjectTwo v2.ObjectTwo `json:"objectTwo"`
}

Initial lines of output of second run (at common/structs.conjure.go):

// This file was generated by Conjure and should not be manually edited.

package common

import (
        v2 "github.com/palantir/conjure-go/v6/cmd/outdir/test/v2"
        "github.com/palantir/pkg/safejson"
        "github.com/palantir/pkg/safeyaml"
)

type ObjectOne struct {
        ObjectTwo v2.ObjectTwo `json:"objectTwo"`
}

What did you want to happen?

The output of the first run should produce compiling code (have the proper imports) and running the command multiple times should not change the generated code.

nmiyake commented 1 year ago

I've determined the cause is the following:

  1. The initial output does not include the proper import because it is removed by the call at https://github.com/palantir/conjure-go/blob/0b9a042a79ab83bbb0c1f6f993aee5fa965703be/conjure/outputs.go#L60
  2. Because the "FormatOnly" option is unset/false, the ptimports.Process call attempts to "fix" imports (adds imports for calls that need it and removes unused imports)
  3. The Golang imports code performs a static analysis heuristic to determine the package name (see https://github.com/palantir/go-ptimports/blob/93cb0022ccc2ca0a21b3ac3396b85349567cc40a/vendor/golang.org/x/tools/internal/imports/fix.go#L1144), and because of the way that module versioning works in Go, for imports where the final part of the path is of the form "v[0-9]+" (for example, "foo/bar/v2"), the package name is assumed to be the last component before the version (for "foo/bar/v2", that would be bar)
  4. After performing static analysis, it also attempts to load the named package. However, on the first round of generation, the package being referenced does not yet exist, so it can only use the static heuristic.
  5. For Conjure-generated packages, the package name is the exact string (so the package for foo/bar/v2 is v2), so in the generated code references are also of the form v2.Object
  6. Because the static analysis assumes the package name for the import foo/bar/v2 is bar, it looks for any accesses against the "bar" portion, and when it doesn't find it in the file, it marks the foo/bar/v2 import as unused and removes it

The result is that, on the first generation, there is no import for packages of the form "v[0-9]+", and the code doesn't compile.

However, when the generator is run again (with the non-compiling generated code still present), at step 4, searching for the package locally does yield a result (because the package has been generated), and then the import code finds the proper package name and does not remove the import. The result is that, when run a second time, the import is not removed and the code compiles.

The proper fix here is to set the FormatOnly property to "true" for ptimports.Process -- this ensures that, although the code will be formatted and styled consistently (including properly converting imports to grouped form), imports will not be added or removed. Since the code generator is responsible for generating the imports properly, this should be fine.

nmiyake commented 1 year ago

Follow-up on the above.

Setting "FormatOnly" to "true" does result in correct code being generated. However, when goimports or ptimports is run on the resulting code, it causes it to be modified.

goimports is written in a manner that will explicitly add a named alias to an import if the inferred name differs from the actual package name, even if such an alias is not required for compilation.

For example, for the import "github.com/foo/bar/v2", the inferred name is bar, but the actual package name is v2. Because they differ, goimports updates the import to be v2 "github.com/foo/bar/v2", even though this "v2" alias is not necessary.

The issue here is that, if the output of conjure-go does not include this extra "v2", then running "go-imports" or "pt-imports" changes the output, which is not desirable.

Taking all of this into account, the most effective approach is to run ptimports with FormatOnly=true when the individual files are written, but to then run ptimports with FormatOnly=false on the generated files after they have ALL been written. This ensures that any generated packages will exist (and properly resolve as package imports) and that the written output will remain stable even if run through goimports or ptimports again.

nmiyake commented 1 year ago

Re-opening this issue because the current solution ran into problems when deployed. Documenting the issues here so that the next attempt at a solution incorporates this.

The solution in #406 works by updating the conjure.Generate call such that, after writing all of the files from conjure.GenerateOutputFiles, it runs an extra import operation that may change the output.

However, the "verify" functionality of godel-conjure-plugin (https://github.com/palantir/godel-conjure-plugin/blob/6a408a96156a9ca98f3c92d18cd52786306e715c/conjureplugin/verify.go#L32) generates the expected files in-memory using just conjure.GenerateOutputFiles and performs a diff based on those checksums, so if the Generate function introduces any changes, they are not captured.

Basically, the usage in godel-conjure-plugin assumes that the output of individual Conjure files by conjure.Generate and conjure.GenerateOutputFiles must match. This happened to be true before the change above, but is no longer true after.

The most ideal solution would be to formalize this contract and fix the behavior of GenerateOutputFiles so that it creates the desired output. This puts us back to determining a solution for the imports without writing to disk first.

The other solution would be to declare that it cannot be assumed that the output of GenerateOutputFiles and Generate will match -- in this case, the implementation in godel-conjure-plugin would need to be changed to deal with this (for example, it may need to read all files into memory first, then perform "real" generation, read all the content of the written files, then restore the original files or something else like that).