RFC: change how protobuf is generated

lestrrat commented 6 years ago

In #75 and #78, I found discrepancies on the names generated by the tool for type names that should actually point to the same protobuf message.

The root cause for this is that there is no central authority in the code that can resolve OpenAPI names to protobuf types -- the code is basically converting the OpenAPI names to a type name on the fly, and if happen to not to use the same converter function in multiple places, we're doomed.

I can certainly keep on sending PRs, but at the end, this just keeps a gaping possibility for a subtle failures, and I don't think I like to build my infrastructure if at least debugging this is easier.

That's the lead up to my suggestion: having spent enough time fixing bugs in the past few days, I think I would really like to see this tool to be refactored from the ground up so it's easier to manipulate things when the need arises in the future.

I'd like to propose to break this tool up into three components:

openapi loader
openapi-to-protobuf converter
protobuf encoder

The loader will read an openapi spec from a source, and apply any "fixups". The converter will take the openapi data structure and convert it to a protobuf structure. and finally, the encoder will traverse the protobuf structure and print out the protobuf definitions.

The secret sauce will be in the converter, and this is where we will be keeping a type of system of sorts, so we can fix the problem of resolving an openapi type name to its equivalent in protobuf.

Thoughts?

jprobinson commented 6 years ago

I definitely like the idea of adding a bit more structure to the flow of this tool and what you propose sounds reasonable.

This also could adds in a hook for one feature I've been wanting to add: checks for breaking changes to proto tags or existing field types.

Any idea on what state the data would be in after the loader step?

lestrrat commented 6 years ago

I expect that the loader simply do what you did in LoadDefinitions, which is to try to load different (possibly simplified) schemas into the "openapi" structure.

80% of the code would be spent on traversing and analyzing this structure.

lestrrat commented 6 years ago

@jprobinson I hacked something up just to get something concrete to talk about.

https://github.com/lestrrat/openapi2proto/tree/issue-79

If you get the above tree and run go run cmd/openapi2proto/main.go -spec fixtures/accountv1-0.json you will get some form of output. It's not done and I'm sure there are loose-ends everywhere, but I'm sharing this now so I can show you what I'm thinking about.

Let me know yay/nay for this direction.

openapi

The openapi subdirectory contains the partial implementation of the OpenAPI spec that we care about, mostly taken from your type APIDefinitions and such. There are subtle differences in the structure, but they are mainly untouched. The idea is that we only use this structure to slurp out data from the source, and don't care about the compilation into protobuf at all within this package.

protobuf

The protobuf subdirectory is similar to the openapi directory, but the API is built such that the data can be programmatically built. There's no way to slurp data from JSON or YAML (as of now), but you can write code that builds the pieces required to create a protobuf definition.

It also has an Encoder object that can traverse through this structure and generate the textual representation of the definition.

convert (I might re-name this to compiler)

This is where the soul of the operation lives. The Convert function takes an OpenAPI definition, traverses the structure, and creates a protobuf.Package object.

jprobinson commented 6 years ago

This looks like a great start, @lestrrat! I really like the break out of the packages. Clean lookin' API so far.

I'm leaning towards "compiler" over "convert" as well. Or maybe "transpiler"?

lestrrat commented 6 years ago

@jprobinson is there a reason why type names are stripped of their trailing "s"'s (e.g. "Result" as opposed to "Results")? I just think the current approach is rather naive, and should not be applied if possible. For example, It would work if the strings is "results", but what if it was "classes"? "Classe" is just confusing now. And I don't think you want to really encode a smart-enough plural to singular converter in this tool :)

https://github.com/NYTimes/openapi2proto/blob/847f9be22a71121eb62c8337544535e120156a29/openapi.go#L785

lestrrat commented 6 years ago

@jprobinson This field seems to have the type ["array", "string"]. Should the resulting protobuf definition have the "repeated" modifier?

https://github.com/NYTimes/openapi2proto/blob/847f9be22a71121eb62c8337544535e120156a29/fixtures/most_popular.json#L462-L468

https://github.com/NYTimes/openapi2proto/blob/847f9be22a71121eb62c8337544535e120156a29/fixtures/most_popular.proto#L51

lestrrat commented 6 years ago

@jprobinson This has to be wrong. I don't think there's an array type?

jprobinson commented 6 years ago

trailing "s"'s

This was a mistake to add in the initial impl, totally fine with nuking it.

["array", "string"]

Since there's no type for the array, I'm tempted to put this one in the "unsupported" realm for now. Should the tool just quit and error at that point? Or do we have it log the issue and omit the field?

Perhaps we can make an issue to support oneof in the future and attempt to reconcile this kind of stuff.

Keep up the great work! Thanks again, @lestrrat!

lestrrat commented 6 years ago

@jprobinson WRT https://github.com/NYTimes/openapi2proto/issues/79#issuecomment-377877354 I think it makes sense to make it repeated and assume that it's either one string, or a list of stuff (hence repeated Any foo). I'm sure there are corner cases, but if we're using Any, I think it's eh, safe.

lestrrat commented 6 years ago

@jprobinson Can you check #80? I think this needs to happen regardless, so I'm going to change all message names derived by property names to have a "-Message" suffix attached to them

lestrrat commented 6 years ago

@jprobinson alright, just to give you an update, I got like 90% of the stuff working, minus a flaky test and kubernetes.json which is a whole different monster. Major features in my branch:

Resources are sorted and grouped (e.g., enums, messages, then services, fields in messages are sorted by their number, rpcs are sorted by their name, etc)
All types are encoded CamelCase, and AFAIK there are no CamelCase_Ish_Names
All enums are encoded PROPERLY_SEPARATED_LIKE_THIS, and there are no ENUMSTHATHAVE_PARTS_OF_THEIR_NAMEBUNCHEDUPTOGETHER.
Nested message names are suffixed with -Message.
Internal data types are somewhat more cleanly separated.

jprobinson commented 6 years ago

As I go through that k8s spec, I find more and more weirdness so I'm fine with it not being 100% perfect in translation, it just gives us a lot of odd scenarios we can use for testing.

I'm really liking all the changes in here but am wondering if we can keep the parent-prefix style for naming nested definitions. Any particular reason why that one doesn't fly?

lestrrat commented 6 years ago

The parent prefix naming thing got long really, really fast. in a some three-level nested messages it was already taking more than half of my terminal width. I don't know, maybe we will never write these names by hand, but I had the sneaking suspicion that it would actually be harder for the user this way.

Also, this is purely an internal implementation detail, but protocol buffer type names for these nested types are already using the parent prefix style. For example

message Foo {
   message Bar { }
}

turns into Foo_Bar in protobuf. So for example, the generated code already has the parent name in the type. So if we also used the parent prefix, then the internal name is Foo_FooBar and this keeps on going. Again, I'm not sure we would ever write code that uses these by hand, but if we do, it would become really really painful.

So that was my reasoning. :)

jprobinson commented 6 years ago

You've convinced me! I tend to avoid the nested definitions for these kinds of reasons, anyways.

Please feel free to make a PR whenever you're ready and I can start reviewing all the goodness tomorrow.

lestrrat commented 6 years ago

Well, I'll have one more pass at tackling the kubernetes thingy, and I have one flaky test that I need to fix (probably has to do with nested types) so I will review that today, hopefully. Then I need to delete the old and/or unused code. I'll send you a PR after that :)

nytimes / openapi2proto