Update `extensions` to be cloneable

LucioFranco commented 4 years ago

I propose that we move forward to allow Request<B: Clone> to impl Clone. The big blocker here is that extensions is not clonable. This could be fixed by making a AnyClone trait:

pub trait AnyClone: Any + Clone {}

The downside here is that all things you store in extensions is cloneable but I think this is a fair trade off since mostly its just information that could be thrown behind an arc or bytes.

cc @seanmonstar @carllerche @hawkw @davidbarsky

dekellum commented 4 years ago

Some thoughts from an interested user:

Given new requirement on extensions, a breaking change then?

You would apply the same to Response?

It sounds like you would impl Clone where B: Clone so its kind of optional for bodies, but over time there might be more code expecting this. An aspect that gives me pause is that hyper's default Body implements Stream (and is not Clone yet) and http_body::Body defines a Stream-like extension interface. How should implementations handle cloning a Stream-type after it has been polled? Do cloned Streams start from the beginning (if that's even possible) or from the point they are cloned?

With synchronous corollaries, we would typically not clone an Iterator, but clone the thing that can provide a new iterator, e.g. Vec or any impl IntoIterator. In other words, adding a Clone requirement suggests that maybe, in a breaking release, we should also change to requiring a Body implement IntoStream? (Note unfortunately there is already something called IntoStream, for an entirely different purpose, in futures 0.3.)

With clone in play, the HeaderMaps in Request and Response might benefit from Arc or Cow?

Would the next breaking release also be time to consider integrating http_body into this crate? Or is that considered out of scope? (As an observer I've hoped the second crate was a temporary solution.)

LucioFranco commented 4 years ago

So I think the real goal here should be to make Parts cloneable and then since bodies are generic let bodies implement clone if they can. For streams, it should be possible to detect if the request failed before the stream was extracted to retry it then. The big issue for me is making Request a bit easier to deal with and extensions seem to always be the big blocker here.

davidbarsky commented 4 years ago

I've shared my opinion on extensions previously and it still hasn't changed. Short version: I think should be removed. Long version:

There are three logical parts of an http::Request:
- The first are the Parts, which carry the protocol version, method, URI, and headers. These are trivially cloneable relative to the next two. I wouldn't even try to place that data in an Arc, as it might be cheaper to perform a memcpy on smaller bits of data than bumping a refcount on an Arc.
- Extensions, which is a typemap that might not contain cloneable data. @LucioFranco proposes that all types that are stored in a typemap should implement Clone, but I'm not sure that's the right approach. Instead, I'd like to remove extensions wholesale and replace their use-cases with task and thread locals for handling implicit context. I believe that thread/task locals (especially thread/task locals that propagate to child tasks using structured concurrency) solve the problem that http's extensions try to tackle in a more elegant and general-purpose way. I'm aware of some specialized use-cases where task/thread locals wouldn't be a good fit, but I don't believe costs should be borne by all users of http. Critically, I think cloning an http::Request is a far more common use-case than placing data into extensions.
- Body: for streaming bodies, it's not at clear how bodies should be cloned in a general-purpose way. We shouldn't even try to handle that here!

LucioFranco commented 4 years ago

@davidbarsky the issue is that task/thread-locals don't work with most channel backed offtask clients like hyper and tower-buffer. While ext do work with this setup. They couple data that is specific to the user within the request.

or streaming bodies, it's not at clear how bodies should be cloned in a general-purpose way. We shouldn't even try to handle that here!

Agreed, but there are special cases like gRPC where we have total control over this and could probably pull of some sneak stuff related to this. Having parts be clonable would be a huge help here.

LucioFranco commented 4 years ago

I wouldn't even try to place that data in an Arc, as it might be cheaper to perform a memcpy on smaller bits of data than bumping a refcount on an Arc.

I mean its your choice how to implement clone, if memcpy is cheap then do that if arc bump is cheap then do that?

davidbarsky commented 4 years ago

The issue is that task/thread-locals don't work with most channel backed offtask clients like hyper and tower-buffer. While ext do work with this setup. They couple data that is specific to the user within the request.

Sure. In those cases, I think the appropriate course of action is to implement an extension trait that performs a tracing-subscriber-style lookup for the request context. That being said, I believe that most uses of extensions are used to change control flow prior to the request reaching hyper::Client or tower-buffer.

Agreed, but there are special cases like gRPC where we have total control over this and could probably pull of some sneak stuff related to this. Having parts be clonable would be a huge help here.

My argument boils down that we don't need to settle for just making Parts cloneable; we can make the entire http::Request cloneable by moving extensions out of http::Request.

hawkw commented 4 years ago

I'm not convinced that task locals are an acceptable substitute for extensions. Linkerd makes makes use of extensions in a lot of places, and although I'll have to take a closer look, I think there are a lot of use-cases that can't be replaced with task-local storage.

FWIW, when Linkerd's retry code "clones" requests, we just clone the parts + body, and create new extensions. A lot of the data we store in extensions is specific to an individual instance of a request — e.g. we want to record latencies for retries separately.

LucioFranco commented 4 years ago

@hawkw that seems like it could easily be done in a clone where you basically clear the data :+1:

zuoxinyu commented 3 years ago

@hawkw that seems like it could easily be done in a clone where you basically clear the data 👍

Agreed, maybe roughly implement like it does in http-types.

rcoh commented 1 year ago

A few options here:

Add the Clone bound when adding items to Extensions today (but don't make the request cloneable). This would allow us to take advantage of this fact in the future without having to decide about request clonability. We could always remove the clone bound in the future without causing breaking changes.
```
   pub fn insert<T: Send + Sync + Clone + 'static>(&mut self, val: T) -> Option<T> {
```
Clear extensions on clone. Request extensions strike me as a "per dispatch" sort of thing anyway–it makes sense that they don't stick around.
Add an additional default generic E or Ext to request: struct Request<B, E = ()>— Then you can do something like pub type ExtendableRequest<B> = Request<B, Extensions>—if you don't use Extensions at all, your request can be clone if the body is clone (or at least Parts can be!). If you want to use extensions you can—you can even bring-your-own CloneableExtensions to have your own extension behavior.

3 would be my personal recommendation–very flexible going forward and allows users who have no desire to have request extensions at all (most customers?) to have a cloneable request without worrying about it. We can also do a minimal 3 where we make the default Extensions so the code as is remains identical but it allows future evolution and also enables folks to get a cloneable request by putting a Clone type in there themselves.

tesaguri commented 1 year ago

Add the Clone bound when adding items to Extensions today (but don't make the request cloneable).

I think we can implement Clone without making a breaking change at all by keeping the existing insert method as-is (whose value would be discarded when cloning the map) and (someday) add another version of insert that takes an impl Clone value (example implementation on Rust Playground).

seanmonstar commented 1 year ago

I've been planning to write up some thoughts, summarizing this thread next week, so we can arrive at a decision. One thing I think is missing is analyzing what extensions tend to be used for. That would help with deciding if they can be clone, if they can't what should they be instead, etc.

For example, in hyper the OnUpgrade extension is not cloneable at the moment. I suppose it could be if we made it an Arc Mutex inside, or something similar (it's currently a oneshot Receiver). Whereas ReasonPhrase is just a string, and could be easily cloned.

I imagine people likely stick loaded context into them, such as a User that was loaded from the DB, or even a DB handle. Are those clone?

There's likely good examples on linkerd2, and in the AWS SDK, and others. Such an analysis would help in deciding "should we".

seanmonstar commented 1 year ago

I'm sure there's others, but do @hawkw @rcoh @robjtede have any comments on whether extensions you use could or could not be Clone?

rcoh commented 1 year ago

We exclusively use Clone extensions. We are currently adding an HTTP wrapper layer and extensions are gated on being clone. Honestly, we could probably even avoid using extensions all together if we had to.

FSMaxB commented 1 year ago

I'm mainly using them with axum's Extension extractor: https://docs.rs/axum/latest/axum/struct.Extension.html In order for this to work (by using it as a tower layer), the value put into the extension needs to implement Clone in any case. I think that holds true for many other webframeworks as well.

Still, requiring Clone is a pretty big change. A crater run like the rust project can do it would be pretty useful to get a better picture. Maybe it's possible to look at some popular crates depending on http at least.

hawkw commented 1 year ago

I'm sure there's others, but do @hawkw @rcoh @robjtede have any comments on whether extensions you use could or could not be Clone?

The linkerd proxy doesn't currently use any request extensions that can't implement Clone , besides hyper's OnUpgrade. However, many of the extensions we use would not necessarily be correct to clone. For example, we use extensions for tracking the lifetime of a single instance of a request, which should be started over with a new extension if the request is retried. So, we would likely continue to implement retries by creating a new request with a fresh extension map, rather than cloning the request. I don't think requiring Clone would break anything for us, though.

rcoh commented 1 year ago

I think that making an E=Extensions default generic on Request and Parts here keeps the option for folks to have non clone extensions if they absolutely need them without adding any complexity for most customers. e.g. you could have ClearOnCloneExtensions to get the behavior @hawkw described above.

hawkw commented 1 year ago

I mean, we could also just "not write code that clones requests" to get that behavior. :)

seanmonstar commented 1 year ago

extensions we use would not necessarily be correct to clone.

That seems like something that can't quite be defined as Clone or not. You want that layer to not reuse the extension, but if some far inner service were to clone or do anything resembling an inner retry, that would be fine.

making an E=Extensions default generic on Request

A potential problem is that everything using http::Request<B> would need to remember to also be generic over E, or else they would essentially require the default impl.

hawkw commented 1 year ago

extensions we use would not necessarily be correct to clone.

That seems like something that can't quite be defined as Clone or not. You want that layer to not reuse the extension, but if some far inner service were to clone or do anything resembling an inner retry, that would be fine.

Yeah, that's right. I only brought this up to say "we probably would not actually use a Clone implementation for requests, but it wouldn't break stuff we're doing (besides hyper's OnUpgrade)".

LucioFranco commented 1 year ago

I think making extensions generic is going to make writing tower layers in tower-http much more painful than the benefit.

@hawkw can you implement that new request logic in terms of a non clonable wrapper? Like you can wrap your extension in a way that it reconstructs on clone rather than reusing?

adamchalmers commented 1 year ago

Context: I was a heavy user of Extensions at a Cloudflare where I wrote HTTP and gRPC backends. I no longer have any active projects that use Extensions as my current workplace uses Dropshot.

All the items we put into Extensions for requests are clone, so it should be fine.

On the other hand, some items we put into response extensions are !Clone, but as far as I understand that is not being proposed.

seanmonstar commented 1 year ago

The Clone would apply to both Request and Response.

rcoh commented 1 year ago

At least for us, the main reason we want clone is to easily recreate Parts during retries without gymnastics. For us anyway, a minimal change like adding .clone_and_clear_extensions() on Parts would pretty much solve it for us. I guess for tests and things folks want to actually clone entire requests?

robjtede commented 1 year ago

Actix Web doesn't use any of the higher-level constructs from this crate, including Extensions, and requiring Clone on the items in this container would not change that. Therefore, my view is not applicable to this particular issue.

adamchalmers commented 1 year ago

The Clone would apply to both Request and Response.

Ok. I can always Arc the response extension types.

GeeWee commented 1 year ago

+1 for aclone_and_clear_extensions or similar, as it would also help us create much easier retries.

seanmonstar commented 1 year ago

Alright, it looks like the consensus is to move forward and require Clone on extension types.

An aside

I was curious if we could use std::any::provide_any etc to be able to query a Box<dyn Any> for a dyn Clone, and then we could conditionally clone all extensions that were cloneable without requiring it. I'm not sure that would even work. But I just learned that provide_any is being de-scoped to just Error::request_ref. It's hinted that provide_any can be done outside of the standard library via the dyno crate, but I haven't tried to do what I said.

Oh well.

seanmonstar commented 1 year ago

PR is up: https://github.com/hyperium/http/pull/634

Will merge quickly, in case you want to chime in, since target launch is Nov 15.

hyperium / http

Update `extensions` to be cloneable #395

An aside