oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
224 stars 34 forks source link

Tight coupling of "HTTP servers, as interface" to "Implementations" is painful #5247

Open smklein opened 4 months ago

smklein commented 4 months ago

This may be an issue related to the dropshot repo too, as Omicron isn't the sole user of this pattern.

This issue describes a process which is documented in our README, but which is still painful: https://github.com/oxidecomputer/omicron?tab=readme-ov-file#generated-service-clients-and-updating

Background

When is this a problem

Mostly: When things don't compile, it can be exceptionally difficult to "regenerate the interface we want".

In the cleanest scenario, you'd make changes to your implementation AND interface together, then run the "openapi spec generation tests", then apply those changes to all clients which exist in the codebase. However, there are scenarios where this breaks:

  1. Circular dependencies. For example, Nexus exposes an internal API to sled agent, and sled agent exposes an API to Nexus. This isn't necessarily bad! They both have reasons to communicate with each other. However, updating interactions between the two servers often involves re-running the JSON generation several times, then dealing with breakages, and then re-building. At an interface level, this is fine, but it's painful that "the rest of the crate (nexus or sled agent)'s implementation must also compile first, before the openapi test can run". This isn't too bad for adding new endpoints, but for altering interfaces that exist, it often means filling up a codebase with todo!() and commented-out code to get to a compiling state, then undoing all those changes immediately afterwards.
  2. Merge conflicts. This forces you to deal with an openapi spec that is out-of-sync from main, but also potentially out-of-sync from your current tree. Again, this forces the same type of solution: find everyone using types that are generated by the openapi spec, and "force them to do the wrong thing, as long as it compiles".

A better world?

It would be really nice to decouple the implementation of these interfaces from the interface declarations themselves. For example, if the Sled Agent API was not actually part of the sled-agent crate, but rather, a separate "sled-agent-interface" crate, we could do the following:

The main advantage here would be "when you want to generate the new interface, do that first, before necessarily touching the implementation" and that would work. This should remove the need for "commented-out" code to get things to compile, since it would no longer be necessary to make the rest of the crate arbitrarily compile before generating the new interface.

sunshowers commented 4 months ago

We already have a small example of this pattern in installinator-artifactd, for reference.

edit: here's the trait

davepacheco commented 4 months ago

I'm not totally sure what you're proposing. To generate the OpenAPI spec, you'd need the entire dropshot server in sled-agent-interface, right? So is the idea that the types and endpoints would live in sled-agent-interface? Is the idea that SledAgent itself would impl some SledAgentApi trait, with one method for each endpoint, and the endpoint definitions would all be one liners that find the trait object and call into trait methods? I feel like there's cost (mostly in terms of developer complexity/experience) in the extra level of indirection that is worse for me than the problem being described here. But this is definitely subjective -- I hear that others are experiencing more pain with this than I am.

smklein commented 4 months ago

I'm admittedly being loose with what I'm proposing to avoid being too prescriptive -- I wanted the issue to mostly describe the problem space, not the solution space, beyond "decoupling would be nice".

Personally I'd really like the following:

Dropshot currently wraps an impl block of a concrete implementation. It could wrap a trait instead:

#[dropshot_server]
pub trait SledAgentService {
  #[endpoint { ... }]
  async fn xyz(ctx: RequestContext<Self>, query_params, path_params);
}

This could exist in a sled-agent-interface crate. In sled-agent proper, we could implement this trait with the following:

impl SledAgentService for SledAgent {
  async fn xyz(ctx: RequestContext<Self>, query_params, path_params) {
    ...
  }
}

Granted, it's also possible to do this separation "below the dropshot layer" by putting a Box<dyn Trait> within the RequestContext, and plugging whichever implementation (real or stubbed) we want to use.

But this would:

andrewjstone commented 4 months ago

I'm pretty strongly in favor of this decoupling regardless of implementation. I think the problems Sean has run into are real, and it is often frustrating to me to get these OpenAPI specs to generate even with just my own local changes. I don't think it's too much boilerplate either. We have much worse elsewhere. Some of it rhymes with weasel.

ahl commented 4 months ago

So, we'd define the interface first as a trait. Would we need some way to emit the OpenAPI document from that trait (indirectly, I'd expect)?

An alternative (more traditional) approach would be to define the interface e.g. in OpenAPI directly or in something like TypeSpec and then generate a trait (or stubs or both) from that.

Presumably we'd imagine this as an alternative mechanism to the current approach in Dropshot rather than as a replacement?

andrewjstone commented 4 months ago

So, we'd define the interface first as a trait. Would we need some way to emit the OpenAPI document from that trait (indirectly, I'd expect)?

An alternative (more traditional) approach would be to define the interface e.g. in OpenAPI directly or in something like TypeSpec and then generate a trait (or stubs or both) from that.

Presumably we'd imagine this as an alternative mechanism to the current approach in Dropshot rather than as a replacement?

I don't think anyone wants to change the autogeneration of specs from dropshot. That's the best part of dropshot. My complaint was mainly about when I create my own circular dependency, not how the spec generation happens. Although I know @ahl reasonably figured I was just being an ahole 😄

ahl commented 4 months ago

I don't think anyone wants to change the autogeneration of specs from dropshot. That's the best part of dropshot. My complaint was mainly about when I create my own circular dependency, not how the spec generation happens. Although I know @ahl reasonably figured I was just being an ahole 😄

To be clear: I think it's reasonable to consider an alternate, additional interface.

sunshowers commented 4 months ago

So, we'd define the interface first as a trait. Would we need some way to emit the OpenAPI document from that trait (indirectly, I'd expect)?

Yes. I think that should be relatively easy.

An alternative (more traditional) approach would be to define the interface e.g. in OpenAPI directly or in something like TypeSpec and then generate a trait (or stubs or both) from that.

I don't want to write OpenAPI directly, but TypeSpec looks quite interesting! My worry is the loss of fidelity inherent in generating Rust from OpenAPI -- we've run into issues with this within clients which have required lots of From impls and occasional replace directives within progenitor.

Presumably we'd imagine this as an alternative mechanism to the current approach in Dropshot rather than as a replacement?

Definitely an alternative, yeah. At the very least we'll want to convert one service over at a time. In the longer term, it would be interesting to see if the alternative is better enough that it should become the only way to write Dropshot servers.

ahl commented 4 months ago

I spoke with @smklein about this a bit. These seem like good goals for an alternate interface to Dropshot. I'm inclined to prioritize this behind two other dropshot-related concepts I've been considering: multi-version support and multi-API per server support. But it's helpful to consider this alongside these other bodies of work.

I've assigned this to myself, but if someone else comes along with cycles and interest by all means feel free to take it over--I'm not trying to exclude others from investigating.