roadrunner-server / roadrunner

🀯 High-performance PHP application server, process manager written in Go and powered with plugins
https://docs.roadrunner.dev
MIT License
7.92k stars 411 forks source link

[πŸ’‘FEATURE REQUEST]: Add HTTP Cache Middleware [IMPLEMENTATION TRACKING TICKET] #898

Closed alexander-schranz closed 1 year ago

alexander-schranz commented 3 years ago

Is your feature request related to a problem? Please describe.

Roadrunner is already handling SSL, as a replace for load balancer, nginx, or other similar service. It would be nice to support a kinde of HTTP Cache, so things like Varnish (which don't support ssl) could be handled

Describe the solution you'd like

HTTP Cache Handling could be added as kind of middleware I think. It could be implemented in different Steps and so not all type of caching need to be added as once.

Describe alternatives you've considered

Using Varnish, Then sadly roadrunner can not longer use as SSL provider and requires a additional nginx in front of varnish handling SSL.

Additional Context

Multi Domain support should keep in mind. example.org/test is different to example.com/test.

Things which should be discussed. File based or in memory cache a hybrid solution would be great to be as fast as possible with low memory usage.

Static file response content don't need additional be stored in cache?

rustatian commented 3 years ago

It's a great proposal, thanks @alexander-schranz . I'll plan it after v2.5.0 is released (~18 of October).

alexander-schranz commented 3 years ago

@rustatian I updated the issue with some hopefully helpful links.

rustatian commented 3 years ago

@alexander-schranz Great, thank you πŸ˜ƒ

rustatian commented 2 years ago

Hey @alexander-schranz, sorry for the delay, got sick with covid in November πŸš‘ . I am going to support at least Redis, boltdb, and in-memory (LRU, with configurable capacity) drivers for the cache in the first phase. The basic idea is to have a wrapper on the RR side that would work with any driver (Varnish, Redis, Postgresql (haha)) which you can easily configure via .rr.yaml.

alexander-schranz commented 2 years ago

@rustatian No problem, I hope you are fine and healthy again. Take your time what you need! For varnish I think there is nothing required todo, as roadrunner would just need to forward the headers from the application correctly like it does it currently and purge request would directly go to varnish and not hit roadrunner in that scenario. For built-in cache directly with roadrunner what I wanted to target in the proposal, redis sounds great as storage, with boltdb I'm not familiar - but seems something similar. A filebased cache would also something which would be great to work out of the box without additonal service then roadrunner. At the end I think the storage should just be abstracted so we can add additional in the future, we just make sure that the abstraction supports correclty the above scenarios based on http standard headers and common practices listed. The storage required to cache the content + response headers. Also we should make sure to set/calculate the correct Age response header when loaded from the cache to response how old the entry is. I forget that one, added above for completeness. So we need response content, response headers and the time when the entry was cached.

At the end I just thought it would be a great addition to have a builtin http cache directly in roadrunner to make usage of the SSL feature in our applications which requires such a cache. Still make sure you are doing fine, no stress on this one, work on it when you like it and don't forget to enjoy other things! For any questions or if I can test or provide you something just ping me here, happy to help if I can.

rustatian commented 2 years ago

I hope you are fine and healthy again.

Yeah, everything is good now, thanks πŸ˜ƒ

A filebased cache would also something which would be great to work out of the box without additonal service then roadrunner.

BoltDB is a file-based cache. This is a sqlite3 analog. So, no configuration from the user end, just specify the driver and run. Much like the memory driver, but persist between RR restarts :)

For any questions or if I can test or provide you something just ping me here, happy to help if I can.

Great, thank you very much.

If you don't mind, I'll mark this ticket as an implementation tracking ticket (thank you one more time for the good and detailed request) and it'll be open till the last level is implemented. For the RR 2.7 (middle of Jan) I'll implement Level 1, and will add 1-2 levels per RR version just to make sure that everything goes smoothly and w/o bugs. So you and the community would have the ability to seamlessly integrate the cache into the existing ecosystem.

rustatian commented 2 years ago

CI closed the issue by mistake 🦑

alexander-schranz commented 2 years ago

@rustatian Really nice to see that you had some time to implement a first version. πŸŽ‰

rustatian commented 2 years ago

You are very welcome πŸ˜ƒ It'll take some time to implement the whole RFC-7234 and add new drivers (Varnish, BoltDB, etc), but, we won't stop, I promise πŸ§‘β€πŸ­

darkweak commented 2 years ago

Hello everyone πŸ‘‹ I was thinking about another cache system implementation. I already wrote an HTTP cache system (called Souin) used by the caddy cache-handler module, compatible with TrΓ¦fik, Tyk and many other reverse-proxies/API Gateway. It supports the RFC-7234, can partially cache graphQL request, invalidate using xkeys/ykeys like Varnish. It also implements the Fastly purge using the Surrogate-key header. It can store and invalidate a CDN (cloudflare, fastly, Akamai), and it supports the Cache-Status RFC directive. It implements two in-memory/fs storages (badger & nutsdb) and two distributed storages (olric & etcd) that are fully configurable. The keys can be tweaked (e.g. serve the same cached css for multiple domains) and we can change the Cache-Status name through the configuration. What do you think about implementing it in the cache repository and make something like the cache-handler, the Souin repo could be the stable development repository and the roadrunner-server/cache could be the ultra stable production ready. Or we can reimplement each features directly in the roadrunner-server/cache repository.

Let me know your preference about that. ✌️

rustatian commented 2 years ago

Hey @darkweak, nice to meet you πŸ‘‹πŸ»

What do you think about implementing it in the cache repository and make something like the cache-handler, the Souin repo could be the stable development repository and the roadrunner-server/cache could be the ultra stable production ready. Or we can reimplement each features directly in the roadrunner-server/cache repository.

We may have a Souin handler in the roadrunner-server/cache repository (to not repeat the code). You may delete everything from the Middleware (https://github.com/roadrunner-server/cache/blob/master/plugin.go#L94) πŸ˜ƒ and put you code in it.

You may propose a storage interface since we need to store the requests. Previously I used a straightforward one: https://github.com/roadrunner-server/api/blob/master/plugins/cache/interface.go#L7. It's not final; feel free to change it.

alexander-schranz commented 2 years ago

@darkweak Souin sounds very promising πŸ‘ If i understand it correctly Souin itself can also be used without a reverse proxy? And so Souin is doing the caching? What is supported by Souin from the listed things above? If I understand it correctly Souin would then just be compiled into roadrunner and I don't need additional application running that? Because that is what I'm targetting to have out of the box support for caching without additonal reverse proxy.

darkweak commented 2 years ago

@alexander-schranz Yes it can be used as a middleware (http.Handler). The more complex part will be the configuration parsing I think.

rustatian commented 2 years ago

@darkweak We use the same config type: yaml. So, you may add any configuration you need under the http.cache key πŸ˜ƒ

EDIT: Here is the RR's cache configuration: https://github.com/roadrunner-server/roadrunner/blob/master/.rr.yaml#L557

rustatian commented 2 years ago

@darkweak If you need any support from my side, I'd be happy to help. You may also join our discord server and ping me directly πŸ˜„

rustatian commented 2 years ago

@darkweak @alexander-schranz Starting from the RR 2.11.0, the Souin cache will be a default cache plugin for the RR. @darkweak Could you please tell me what features from this feature request are supported by Souin? According to the docs, I guess Souin supports all described features except Edge Side Includes, am I right?

darkweak commented 2 years ago

@rustatian ATM it doesn't support the ESI and the Level 8: User context Caching section.

I planned to work on the ESI support but it takes time to have a robust and efficient system.

rustatian commented 2 years ago

Got u, thanks πŸ‘πŸ»

alexander-schranz commented 2 years ago

@darkweak nice to hear that we could achieve most of the things via Souin. At the end the to open things if not official supported by Souin could still be achieved if there are hooks provided.

Edge Side Includes

This is some kind of a "Response" Hook directly before content is send back to the user (after response is already saved to the cache). So if Souin provides a response hook which allows to manipulate the response content it can achieve this. As additional "plugin", ... could hook into it parse the response and and replaces the esi-includes with the content.

User Context Based Caching

This is some kind of a "Request" Hook, the more common example is a cache based on "User-Agent" header. So I want to have "mobile" and "desktop" browser different content. So when the "Request" is coming in I'm parsing "User-Agent" and normalize it into a custom X-User-Agent: mobile" / X-User-Agent: desktop header and tell Souin that the cache key is not only is the Url of the page but also the X-User-Agent. So Souin would here need to support then that we can create a "Request Hook" and that "Caching Context" can only be the Url but also a request header. The plugin can then via response hook at Vary: User-Agent and return the content. For a real "user context based" caching the plugin can also get the "user context" for role based caching from the application, but that is something Souin would not need to take into account as that logic would then not live there.


The hooks also don't need to live in Souin they could also live in Roadrunner, maybe already possible via middleware architecture of it. So ESI could be an own middleware which Souin don't need to support if you don't want it there. For user context based caching, Souin would need to support that Caching works not only based on the Url instead also on an additional send header which can be configured.

alexander-schranz commented 2 years ago

Some context to "ESI". I currently seeing 2 different implementation of ESI around.

The simple solution: Parse whole Response Content and Replace

This is mostly implemented currently in user land like Symfony Framework in PHP. Before it begins to send the content to the browser. It parses the whole "Content" and replaces all "ESI-Includes" before sending it all content together.

The performance solution: Streamed sending and replacing

If you use ESI via varnish you will see that Varnish directly is sending the cache content. But it detects the ESI while sending it. So if a ESI appears all what was in the Response content before is already send to the browser. Varnish then makes the request, waits for it response, sends the response also to the browser and continue with the rest of the cached item. This is very performant as not the whole "content" need to be kept in memory but also more complex to implement.

darkweak commented 2 years ago

For user context based caching, Souin would need to support that Caching works not only based on the Url instead also on an additional send header which can be configured

There is the default_cache headers and urls headers directives in the configuration to add more properties in the key generation. But I'm not sure if it works well.

darkweak commented 2 years ago

The ESI tags are now handled in the latest version. I plan to implement the response streaming later.
Can you update the tasks and check the ESI support please? :)

rustatian commented 2 years ago

The ESI tags are now handled in the latest version. I plan to implement the response streaming later. Can you update the tasks and check the ESI support please? :)

Sure, thank you very much for your work ⚑

alexander-schranz commented 2 years ago

@darkweak nice to see this is still moving forward. I want to mention a new RFC which is an alternative to the Level 3 Custom TTL Header part. Why the custom TTL is still currently supported and used in Symfony Application a new RFC targets to solve a common issue is now around. I already opened a discussion in Symfony about its support: https://github.com/symfony/symfony/issues/47288

The source of the RFC is https://datatracker.ietf.org/doc/rfc9213/ and it is called Targeted HTTP Cache Control. As an example you can in the application define which specific cache-control header you keep in mind. E.g. an application could set:

Roadrunner would only look in this case at Roadrunner-Cache-Control and ignore the others. The different to X-Reverse-Proxy-TTL it does not only have a number instead all common cache values max-age, must-revalidate, ... Difference to the Cache-Control header is that the max-age is relavant and not the s-max-age part (that does not exist in targeted-cache-control headers).

Reading about it is common that reverse proxy keep CDN-Cache-Control in mind and a specific one in case of fastly it is Fastly-Cache-Control. In our case it could be CDN-Cache-Control, Roadrunner-Cache-Control, Souin-Cache-Control. When nothing of the header is represented the Cache-Control headers s-max-age should in my opinion still keep in mind, but think that is also shown in the RFC.

So the target of the RFC is that an application can have multiple reverse proxy caches and with specific headers the different reverse proxy caches can be controlled. The CDN-Cache-Control is the one supported currently by all, but every provider have its custom one specifically targetting it Cloudflare-Cache-Control / Fastly-Cache-Control / Akamai-Cache-Control and so would be nice to have a Roadrunner-Cache-Control or Souin-Cache-Control.

darkweak commented 2 years ago

To keep you updated: I'm working hard on the streaming response but that's hard to make the esi streamable because there are a lot of calculation to handle and process the esi tags asynchronously. The {cache_name}-Cache-Control support will be easy to implement in the Souin codebase because we already support the dynamic cache name depending the configuration.

alexander-schranz commented 2 years ago

@darkweak nice to hear that you are working on it. Basic ESI support sounds already great, streamed is just improvement of the performance which would be great but isn't required to fullfill the support.

The custom cache control headers sounds great, does things like stale-while-revalidate, stale-if-error cache directives also work there?

darkweak commented 2 years ago

@alexander-schranz Yep it supports the stale- directive :)

darkweak commented 2 years ago

The {name}-Cache-Control is now merged in master and I tagged the new version to include it! πŸ™‚

rustatian commented 1 year ago

Hey guys πŸ‘‹πŸ» As far as I understand (correct me if I am wrong), the level 8 is implemented via the default_cache headers and urls headers configuration options. And since this is the maximum that we can do for the caching (keeping in mind, that RR is not a web-server), may I close this ticket as Done? @alexander-schranz

rustatian commented 1 year ago

Closing this as done. Thank you very much @alexander-schranz @darkweak πŸ‘πŸ»