tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
614 stars 103 forks source link

Flexible cache keys #1391

Open krizhanovsky opened 4 years ago

krizhanovsky commented 4 years ago

Scope

Some applications need a flexible way to cache content, e.g. the cache key must include X-Forwarded-Host header. The current scheme ignoring the header may lead to Web cache poisoning attack (https://youtu.be/oBKoocE5id4?t=1965)

Testing

Functional test launching web cache poisoning attack using X-Forwarded-Host and any other custom header. The test must clearly show that Tempesta is able to safely handle the requests.

vankoven commented 4 years ago

A couples of days ago I bumped into cache key configuration on Fastly and Cloudflare. They allow user to form a cache key from arbitrary headers:

The modifiable portions of a Cache Key include:

  • Scheme - HTTP or HTTPS requests
  • Geo - the two-letter country code for a visitor’s country of origin
  • Language - As specified in the Accept-Language header from the browser (trimmed to the first language)
  • Cookie test - whether a browser cookie is provided (indicated by a 0 or 1)
  • Cookie content - the full value of the cookie (available only under certain circumstances)
  • Header test - match found in the browser’s request header (indicated by a 0 or 1)
  • Header content - the full value of the specified header
  • Device type - a header value indicating the user’s device (mobile, tablet, or desktop)

The reason to do this - not only cache poisoning mitigation, but also A/B testing, different designs for different countries, desktop|mobile versions. Here is a good article about altering cache behaviour using Vary: header. I believe it can be greatly applied to flexible cache keys use cases.

krizhanovsky commented 3 years ago

At the moment our site (as well as many other business sites) handle advertisement requests like https://tempesta-tech.com/development-services?utm_source=clutch.co&utm_medium=referral&utm_campaign=developers where the whole query string doesn't generate any different content, but does require different cache entries. Moreover, Google Ads can generate personalized query strings to track users, so all the users from the Google Ads have to go to the upstream and can not be serviced from the Tempesta cache.

We need generic solution like Nginx proxy_cache_key, i.e. we need to introduce at least $uri, $args and variables named by headers, e.g. $host or $x_forwarded_host.

Need to develop a functional test for the options. Also update https://github.com/tempesta-tech/tempesta/wiki/Caching-Responses

TBD: some WordPress plugins are able to ignore certain get parameters for caching. So this issue depends on #1276 and we need to map the GET parameters to variables.