microsoft / typespec

https://typespec.io/
MIT License
4.1k stars 194 forks source link

Help with handling name conflicts with emitter framework or compiler APIs #1876

Open bterlson opened 1 year ago

bterlson commented 1 year ago

In TypeSpec, it is fairly easy to create multiple declarations with the same name, they just need to be in different namespaces. At the same time, emitted code generally does not want to include the TypeSpec namespace in its declaration names, either because this is not possible or just undesirable due to the added verbosity/confusion). So emitters need to handle this case gracefully, for example as the OA3 emitter does by prepending the namespace when needed.

Unfortunately the solution to this problem is not particularly straight forward, with a number of approaches available, and all having nasty tradeoffs:

  1. Always emit namespaces under the top-level namespace, which adds verbosity in cases its not required, and makes organizing types differently a potential API breaking change.
  2. Detect name conflicts up front and disambiguate, which requires two passes (name collect, emit) with lots of potential duplicated logic between them like applying visibility, and also knowledge about all types that will be emitted before any emitted output is created (which is not necessarily the case when the emitter is being used by a library, e.g. if a validating REST service emitter wants to emit JSON schema for certain models it comes across while emitting).
  3. Detect name conflicts as they happen and disambiguate, which is hard to do deterministically and can require patching up previously emitted code.
  4. Throw errors and require disambiguation via things like $id, friendlyId, etc.

The OpenAPI emitter takes a combination of approaches 1 and 4.

In any case we shouldn't require all emitter authors to muddle through this and provide some sort of solution in either the compiler or the framework (or recommend approach #4 if that's what we like).

NielsCo commented 1 year ago

I'm currently working on a postgres emitter and this is how my current code handles these issues:

Namespaces get converted to schema. With nested namespaces being called there entire path with "_" between them.

namespace one.two.three becomes CREATE SCHEMA one_two_three

Models an Enum which get emitted are then part of the schema that was their namespace.

Regarding which namespaces to emit I currently have two modes my emitter can work in:

  1. Only emit models/enums with a custom decorator "@entity()" AND models/enums that get referenced by these models. (Having a model as a property will create a foreign key and also emit the referenced model, using an enum will create the enum and use it) All namespaces that have entities will get emitted as schema, all others will be ignored.
  2. Emit all models/enums except for the imported (from the namespaces "TypeSpec" and "OpenAPI" and all their children. I haven't spend much time on testing this mode as I don't think it is that desirable to emit all DTOs as Database-Entries as DTOs usually can have many "joins" in regard to their DB-representation. This mode is still WIP and I haven't written a lot of tests for it though. Currently this code also will not emit empty namespaces as schema as it uses the logic from 1. to get the namespaces that are used.

Regarding name collisions I already implemented logic that detects naming collisions. I had to implement that as the @entity(name) decorator can take a name parameter for naming the entity differently in the DB-representation. If there are collisions between names set in the decorator parameter and any other names (except anonymous names) my code throws an error. For the Edge-Case of a model-name colliding with a schema-name it just throws a warning. For anonymous names colliding with "regular" names, I only throw a warning.

An example of anonymous names are string union types. I convert them into enums and have to come up with some name for them. That name can then collide with a name that is in the actual TypeSpec-File. These collisions get handled with a counter in the output and only throw a warning.

For my emitter the naming problematic should already be taken care of, but it was not that trivial to implement. I can imagine that multiple domains/emitters will have to create names for things that are anonymous in TypeSpec as this is also the case with code-generation from openAPI-files and as TypeSpec is kind of similar to TypeScript which also allows more anonymous types than a lot of other languages IMO.