Open pmac opened 3 years ago
Parameter list:
v
(for Traffic Cop experiments)xv
(to serve different templates / designs for a given page)entrypoint_experiment
(for FxA experiments)entrypoint_variation
(for FxA experiments)experiment
(for stub attribution experiments)variation
(for stub attribution experiments)geo
(for testing geo-related content)installer_lang
(for installer help page)channel
(for installer help page)oldversion
(for /whatsnew pages)scene
(legacy param for firefox/new/ page redirect)f
(for funnelcake query params)success
(speaker request form submission)submitted
(fraud report form submission)lang
(is this used by our language negotiation code as a fallback to language header?)automation
(this is used by functional tests, but may not be required here)cjevent
(VPN affiliate attribution, only used client side)reason
(Firefox update errors)Note: I'm mostly discovering these params via searching for request.GET.get
.
@pmac here's a WIP list, but I need to do a bit more checking still I think.
Quick question for validation: the params we're interested in are only the ones that we use to render different content, right? Also: do I need to worry about things like the newsletter preference center, or stub attribution endpoint? I'm assuming those aren't cached by the CDN already.
@pmac here's a WIP list, but I need to do a bit more checking still I think.
Great start! Thanks!
Quick question for validation: the params we're interested in are only the ones that we use to render different content, right?
Correct.
Also: do I need to worry about things like the newsletter preference center, or stub attribution endpoint? I'm assuming those aren't cached by the CDN already.
I believe that is correct. I'll have to check the stub attribution one as in theory that could be cached for people who request a signature for the same set of attributes.
We don't need to respect the geo
param tough right? We are specifically not respecting it for the prod domain.
We don't need to respect the geo param tough right? We are specifically not respecting it for the prod domain.
We are for stage though, right? I thought that uses the CDN also.
True. I guess we should keep the stage and prod CDN configs as similar as possible. I was thinking we'd only do this query param thing on prod, since it's really meant to improve our cache hit ratio and we don't have any issue with that on stage.
Yeah, I also think from a testing perspective doing this on stage first before rolling it out makes sense.
@pmac ok I think the list above looks to be about it. Some of the items in the list I wasn't 100% sure about being effected, but I included them anyway to be on the safe side. Let me know what you think.
I also didn't include any params in the newsletter app, or the stub attribution endpoint. Maybe we can exclude those URLs if needed?
I did check stub attribution and it does allow itself to be cached on the assumption that the query params would be unique per cache entry. I think the solution there is to just turn the cache off for that view.
For the benefit of ticket archaeologists: the CDN in use when this ticket was written was Cloudflare. Since then, the CDN was switched to AWS Cloudfront.
@stevejalim note that this issue was never completed originally (hence it's still open). Right now I believe we still include all query params in the cache key, which has always been the case up until now as far as I'm aware.
@alexgibson just checked and yeah, you're right - thanks for the poke! we do appear to be cacheing all querystrings, still, even though we've swapped CDN
@pmac is this something we still need / want to do for GCP?
We're not quite on the GCP CDN yet, but it probably wouldn't hurt to check in with @bkochendorfer when we are on this to see if it'd help, or if we even need this sort of help.
It looks like gcp cloud cdn does support adding particular query keys to the cache key. As @pmac mentioned we aren't using GCP CDN quite yet but will in the near future. The way it is currently configured is the same as AWS Cloudfront in that we are caching all query keys.
Can you all help me understand what the intent of this change would be? I see in the originating comment that we want to improve
but I'm not sure if that is a performance concern or something else.
IIRC the idea is that there's only a subset of query params that we use to affect the page served from our servers. The rest are either for analytics (UTM params) or front-end things. So we could improve our cache-hit rate by only allowing those params which we need to affect the cache. If our hit-rate is good enough, it is probably not worth the effort to make this happen. But the hypothesis is that we often use UTM params in our campaigns that end up on www.m.o, and so if those requests were handled via cache more often that might be better for the backend.
@pmac Ok understood. Right now our cache rate moves between 95-99%, so fairly high. Happy to experiment with this if we think we can pull that up even higher. If we know the headers we do not want to add we can add those to exclusion list (UTM) which might make this easier.
If there's an exclusion list then just excluding the several utm_*
params would be most of it I think.
Our CDN has the capability to include the query parameters for requests in the cache key or not. It also will allow us to specify which parameter names to include or exclude. We currently include all query params in the cache key, but to improve we should only include those which we use on the server side in our views. Let's get a list of these params together and update the CDN config to only include those in the list.