Closed lukasholzer closed 1 month ago
File | Before (Size / Gzip) | After (Size / Gzip) |
---|---|---|
dist/run/handlers/cache.cjs |
16.6 kB / 4.5 kB |
0.6%β16.7 kB / 0.8%β4.5 kB |
Total (Includes all files) | 5.3 MB / 959.1 kB |
0%β5.3 MB / 0%β959.1 kB |
Tarball size | 912.0 kB |
0%β912.1 kB |
π€ This report was automatically generated by pkg-size-action
I wonder whether we might need to expand the encoding? AFAIK,
encodeURL
will leave reserved characters unencoded, which means a URL likehttp://foo.com/resource,123
would emit a cache tag with a comma in it. Should we useencodeURIComponent
instead?
I don't think it's that simple - I was running some tests here and it seems to me that comma case need to be handled on different level than current change - current change only applies to page router, because app router already actually handles it fine (I did add test cases in last commit here for it)
The comma problem however applies right now to both Pages and App router in similar way, so just swapping currently added encodeURI
with encodeURIComponent
would be insufficient to fully support it.
Also encodeURIComponent
would replace /
with encoded character so this would be bigger departure from what we currently generate vs using encodeURI
- maybe we would only want to target comma additionally here to limit changes? Or we treat those completely as implementation detail that can change at any time, then using that would be fine (tho debugging might be a bit more annoying with so many extra characters being encoded in header we produce)
I was looking into comma case, and this is getting quite not straight forward with our usage of split(',')
in places like:
https://github.com/netlify/next-runtime/blob/0b74e135b213de07e9b31bf27f517becd125c212/src/run/handlers/cache.cts#L142
https://github.com/netlify/next-runtime/blob/0b74e135b213de07e9b31bf27f517becd125c212/src/run/handlers/cache.cts#L422
it basically looks like we can't really encode this specifically because we can't distinguish between between comma being a separator and being actual part of one of the items.
It does feel like we kind of have to apply same handling and actually produce 2 (or more) cache tag if actual tag contains comma and instead make sure the way we interact with tag manifest blobs and purge cdn API also need to split on it - this might mean accidentally purging more content than desired but because we don't handle entire pipeline at least for app router - we just can't make it work well in this scenario in which case it would make sense to arrive and similar handling for both app and pages router?
I was doing much more tests with comma and this getting super confusing and annoying ...
cache tags for app router are generated differently when prerendered and when rendering on demand later ...
in prerendered comma does get encoded:
{
path: '/product/δΊεγ¬γ³γγͺγ³γ°γγγ¦γγͺγ,test'
responseCacheTags: [
'_N_T_/layout',
'_N_T_/product/layout',
'_N_T_/product/[slug]/layout',
'_N_T_/product/[slug]/page',
// notice '%2Ctest' which is ',test'
'_N_T_/product/%E4%BA%8B%E5%89%8D%E3%83%AC%E3%83%B3%E3%83%80%E3%83%AA%E3%83%B3%E3%82%B0%2Ctest'
]
}
while non-prerendered gets this:
{
path: '/product/δΊεγ¬γ³γγͺγ³γ°γγγ¦γγͺγ,test',
responseCacheTags: [
'_N_T_/layout',
'_N_T_/product/layout',
'_N_T_/product/[slug]/layout',
'_N_T_/product/[slug]/page',
// in here comma was not encoded so our splitting on ',' made 'test' into separate cache tag
'_N_T_/product/%E4%BA%8B%E5%89%8D%E3%83%AC%E3%83%B3%E3%83%80%E3%83%AA%E3%83%B3%E3%82%B0%E3%81%95%E3%82%8C%E3%81%A6%E3%81%84%E3%81%AA%E3%81%84',
'test'
]
}
I'm thinking that maybe we should split on both ,
and %2c
to get ~consistent handling - even if it is not the exactly correct one, because it will always generate 2 cache tags if cache tag contain comma (encoded or not) and then some unrelated purge might purge those too, but because we are not getting encoded comma consistently - we can't really handle this in any other way?
I'm thinking that maybe we should split on both , and %2c to get ~consistent handling
Maybe we could make the splitting a bit smarter and continue to split on comma, but only if the comma is followed by _N_T_
?
Maybe we could make the splitting a bit smarter and continue to split on comma, but only if the comma is followed by
_N_T_
?
fetch cache tags are not prepended with _N_T_
and instead we get them as they are set by users - otherwise that would be awesome idea to be able to accurately distinguish between entries (that they all must start with _N_T_
)
fetch cache tags are not prepended with _NT
true, but the fetch cache tags are always prepended in the cacheValue.headers[NEXT_CACHE_TAGS_HEADER]
string, so we can start splitting by ,_N_T_
after the first _N_T_
fetch cache tags are always prepended
If that's true, then it probably would work, but this feels way more complex than it should be and also more fragile to next.js changes I think. I'll give it a try regardless just to see how messy setup would have to be for it
Ughhh. There is another layer here with comma in fetch cache tags ... sigh
next: { tags: ["tagstart,tagend"] },
and
next: { tags: ["tagstart","tagend"] },
Will produce same thing (both prerendered and generated on demand) - this would still work with current code in this PR (albeit it would always treat both of above as the last one in practice), but smart splitting idea (which I hoped would result in more exact purges) wouldn't help here - how should:
revalidateTag('tagstart')
revalidateTag('tagstart,tagend')
those work when you can't distinguish them? the only thing that does work is splitting on comma again - but then how do you handle
revalidatePath('/foo,baz')
?
because we don't know which one (revalidateTag
or revalidatePath
) user called and both result in calling CacheHandler.revalidateTag
and we would need different behavior based on wether it was path or a tag
Now I'm even more in favor of current approach that just works with both of those cases (it might purge more than needed if there is overlap with different cache tag after splitting on both comma and encoded comma, but this feels better than not being able to invalidate something at all)
I think you're right - the current solution is the best because the impact of any problems is limited - the worst scenario I can think of is that a URL has a comma in it and, after splitting, the first part matches a layout, so a whole bunch of unintended pages get purged, but that's an edge case and why would you use a URL scheme like that anyway?!
Description
Fixes https://linear.app/netlify/issue/FRB-1352/handle-non-ascii-characters-in-cache-tag-headers
Handles non ASCII characters in cache-tag headers by URI encoding them (does not encode / or valid characters just the non ASCII ones)
Documentation
Tests
Added e2e test case for cacheable page router pages with non-ascii paths
Relevant links (GitHub issues, etc.) or a picture of cute animal