Closed jeswr closed 4 months ago
Suggest updating the cache time to infinity... / make it never expire, apply the same thing to v2 context.
The issue was discussed in a meeting on 2023-08-15
I am not a good apache expert. Before updating on the server, I would like to get some comments on the .htaccess file. The relevant parts would then be:
ExpiresActive On
ExpiresByType application/ld+json "access plus 1 months"
RewriteRule ^v1$ https://w3c.github.io/vc-data-model/contexts/credentials/v1 [E=json,P]
RewriteRule ^examples/v1$ https://w3c.github.io/vc-data-model/contexts/credentials/examples/v1 [E=json,P]
Header set Content-Type application/ld+json env=json
@OR13 I did not find a way to make the expiration time set to infinite. But I actually prefer to keep a regular refresh request, in the case there is a bug in the file that needs change. Even the one month access seems to be fairly large; @jeswr requested a single day...
@msporny I guess you have some experience via w3id...
cc @deniak your reaction is also crucial...
@jeswr wrote:
This can significantly slow down the time it takes to parse VCs as JSON-LD.
You should be permanently caching that context file and not loading it from the Web (unless you have a very good reason that you're not caching the file). We suggested that implementers do this in v1 and v1.1, and we are STRONGLY advising that you do this from v2 and beyond. Search for the word "cache" in the latest data model specification for more information: https://www.w3.org/TR/vc-data-model-2.0/
... or, see the next-to-last paragraph in this section for specific guidance:
https://www.w3.org/TR/vc-data-model-2.0/#json-ld
NOTE: Don't permanently cache the v2 context until the v2.0 specification becomes a global standard (expected by end of Q2 2024.
@OR13 wrote:
Suggest updating the cache time to infinity... / make it never expire, apply the same thing to v2 context.
No, we don't want to set it to infinity for the reasons Ivan stated. One accidental admin change to the file and we could permanently knock a number of implementations offline. We need to design for human error, and eventual recovery (even for systems that are implemented in ways that we don't approve of) no matter how remote the possibility. A day, week, or month seems like a reasonable expiry time (depending on how conservative we want to be).
@iherman wrote:
@msporny I guess you have some experience via w3id...
w3id.org uses a "heuristically cacheable" approach (but does not provide a "Last-Modified" header by default): https://www.rfc-editor.org/rfc/rfc9111#section-4.2.2 ... and that's not a good model here. We want to be explicit w/ the expiry time, for both the v1 and v2 context.
Once the TR happens, doesn't setting the cache to anything other than infinity signal we expect the context to change?
I get the argument about malicious admins... But it would seem a better defense to set the cache time to infinity when you know it's correct, than it would be to encourage clients to load a context that expired, because the latter will actually lead to broken signatures in the case of an insider threat.
Once the TR happens, doesn't setting the cache to anything other than infinity signal we expect the context to change?
All recommendations, or adjacent files like the context file, may have errata, and W3C does republish recommendations with such errata handling time-to-time, when needed. The same is true for a context file.
The issue was discussed in a meeting on 2024-01-24
Based on the Apache expire module settings, what seems doable is to add the following statement into the .htaccess
file:
ExpiresByType application/ld+json "access plus 1 month"
(There is no statement to set the expiration for a specific file. Alas!)
However, the same .htaccess
file controls other redirections, namely those that access the vocabulary files, currently redirected to https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.jsonld
. On long term, that is all right, but I do not know whether this expiration would create problems while the vocabulary is still in development. Also, if we put this expiration date of a month against the jsonld
, we should also do the same for the other vocabulary files (html
, ttl
, svg
). Finally, we should also do the same for /ns/credential/.htaccess
which controls the v2 version of the context file and (still to be done) the vocabulary for the bitstrings. I am not sure if this is fine at this point when we are still under development of all these.
Proposal: postpone this change until we publish our Recs. At that point the vocabulary files will have to be collected on W3C date space for finalization, and we can look at the policy altogether instead of making such punctual changes.
@msporny @davidlehn @brentzundel @TallTed WDYT?
The issue was discussed in a meeting on 2024-02-28
The issue was discussed in a meeting on 2024-03-06
The issue was discussed in a meeting on 2024-06-12
Putting aside what should happen here for a minute, what is happening?
last-modified
is likely set to a file update time?etag
is present.cache-control
has max-age
set to 10mins.expires
is jumping around for some reason. It will be hours in the past, then 10 mins in the future, then fade back.expires
updates do not seem aligned between resources.expires
is ignored (on anything modern) if both it and max-age
are present. Apache mod_expires
docs and tests indicate it will set expires
and max-age
.With max-age
value, etag
and last-modified
, well behaved user agents can handling caching by using max-age
, sending if-none-match
, and/or sending if-modified-since
. I'm not sure what they do with the odd expires
values.
What is controlling expires
? Apache mod_expires
would by default align expires
and max-age
. It looks like varnish is being used. Maybe that's adjusting the values?
I was looking at the headers with:
curl -sI https://www.w3.org/2018/credentials/v1 | grep -E 'date:|expires:|cache-control:|last-modified:'
curl -sI https://www.w3.org/ns/credentials/v2 | grep -E 'date:|expires:|cache-control:|last-modified:'
curl -sI https://www.w3.org/ns/credentials/examples/v2 | grep -E 'date:|expires:|cache-control:|last-modified:'
It results in output like:
date: Tue, 02 Jul 2024 04:32:47 GMT
last-modified: Wed, 26 Jun 2024 23:20:36 GMT
expires: Tue, 02 Jul 2024 04:31:20 GMT
cache-control: max-age=600
I'll not dump all the results over time here, but a few selected results for v1 context showing the expires
value jumping around between past and future and now(ish), always with max-age=600
:
date: Tue, 02 Jul 2024 03:17:55 GMT
expires: Tue, 02 Jul 2024 01:11:18 GMT
date: Tue, 02 Jul 2024 03:20:33 GMT
expires: Tue, 02 Jul 2024 03:30:04 GMT
date: Tue, 02 Jul 2024 04:06:33 GMT
expires: Tue, 02 Jul 2024 03:47:03 GMT
date: Tue, 02 Jul 2024 04:31:43 GMT
expires: Tue, 02 Jul 2024 04:31:20 GMT
date: Tue, 02 Jul 2024 04:32:47 GMT
expires: Tue, 02 Jul 2024 04:31:20 GMT
Given the above odd expires
behavior, maybe some investigation should happen to determine why it's not aligned with the max-age
. Addressing that might at least help to ensure clients see consistent 10 min caching hints. Maybe even test (somewhere temporary) if explicitly setting ExpiresByType
or ExpiresDefault
helps the situation.
As far as what the expiration time should be, I'm in favor of the shorter times. Mistakes can happen and using infinite expiration or the immutable
flag could be difficult to recover from in some cases. That is somewhat in conflict with the 2.0 spec text that says "The data available at https://www.w3.org/ns/credentials/v2 is a static document that is never updated, and SHOULD be downloaded once and cached." Even with that strong language, I'd think the headers could be less strict on the order of days to a month. And while still being developed, much shorter like minutes. I'm not sure what best practice is for these situations.
Also w3id was mentioned above. I just checked, and mod_expires
isn't enabled and I don't think anyone sets cache headers directly. So it's using HTTP redirect codes and whatever the target servers do.
@deniak I would prefer to lean on you for https://github.com/w3c/vc-data-model/issues/1239#issuecomment-2201982611. My knowledge on Apache setup is limited at best... Thx.
The issue seems to come from github. https://www.w3.org/2018/credentials/v1 is actually a proxy to https://w3c.github.io/vc-data-model/contexts/credentials/v1.
Now, github pages rely on varnish to cache the pages for 10min, hence the cache-control: max-age=600
. However, it's not really clear why the expires header is not always updated after we reach the invalidation time, e.g.
$ curl -I https://w3c.github.io/vc-data-model/contexts/credentials/v1
expires: Wed, 03 Jul 2024 05:19:15 GMT
cache-control: max-age=600
date: Wed, 03 Jul 2024 05:32:57 GMT
I don't know if this should be considered as a bug that needs to be reported to the GitHub support because the expires
header is ignored if the cache-control: max-age
is present.
Thanks @deniak !
@davidlehn @msporny this shows that the proper and final solution is, at this moment, not in our hands. At some point in the rec-track process we will have to move all context and vocabulary files onto the W3C space, and it may be wiser to finalize these issues at that moment rather than to fight with github. I guess that moment will come when we go to PR, ie, hopefully, in autumn '24. It is not that far...
rather than to fight with github
It seems worth raising this as an issue with them, which could well lead to an immediate acknowledgement of it as a problem, which could equally well lead to a (relatively) swift fix.
Of course, they may also say they won't fix it, or not even acknowledge it as a problem, in which case we're little worse off than we are now. Still, others may see our report, and chime in with "me, too!" or the like, which additional voices may lead to a revised response from the powers that be at GitHub.
(I won't volunteer to raise this issue, as I don't feel I know enough about these HTTP headers. Perhaps @deniak or @davidlehn would be willing to raise it, even if they're unable to invest much time or energy in subsequent discussion with GitHub.)
The issue was discussed in a meeting on 2024-07-17
@msporny from the meeting minutes:
Manu Sporny: here is one way to set expires on specific URLs: https://stackoverflow.com/questions/1600831/setting-expires-header-for-a-specific-uri.
But that refers to a question on setting expiration on a specific URL, but there is no answer. Actually, the first answer seems to suggest that this is not possible.
Also: you said on the call that we are talking about 'v1', which is stable. However, .htaccess still redirects v1 to github. Isn't it better if that context file is copied on W3C first?
I have set up a separate test directory in https://www.w3.org/People/Ivan/Tests/credentials/. The directory contains:
index.var
and vocabulary.var
for content negotiations.htaccess
fileThe .htaccess
file looks as follows:
RewriteEngine On
# RewriteBase /2018/credentials/
RewriteBase /People/Ivan/Tests/credentials/
AddType application/ld+json .jsonld
AddType text/turtle .ttl
ExpiresActive On
ExpiresByType application/ld+json "access plus 1 months"
ExpiresByType text/turtle "access plus 1 months"
RewriteRule ^v1$ v1.jsonld [E=json,P]
RewriteRule ^examples/v1$ https://w3c.github.io/vc-data-model/contexts/credentials/examples/v1 [E=json,P]
RewriteRule ^$ https://www.w3.org/2018/credentials/index [P]
RewriteRule ^credentials.html https://www.w3.org/2018/credentials/index.html [P]
RewriteRule ^credentials.ttl https://www.w3.org/2018/credentials/index.ttl [P]
RewriteRule ^credentials.jsonld https://www.w3.org/2018/credentials/index.jsonld [P]
RewriteRule ^credentials.svg https://www.w3.org/2018/credentials/index.svg [P]
RewriteRule ^index.html$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.html [P]
RewriteRule ^index.ttl$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.ttl [P]
RewriteRule ^index.jsonld$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.jsonld [P]
RewriteRule ^index.svg$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.svg [P]
RewriteRule ^vocabulary.html$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.html [P]
RewriteRule ^vocabulary.ttl$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.ttl [P]
RewriteRule ^vocabulary.jsonld$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.jsonld [P]
RewriteRule ^vocabulary.svg$ https://w3c.github.io/vc-data-model/vocab/credentials/v2/vocabulary.svg [P]
Header set Content-Type application/ld+json env=json
The relevant part for this issue is the v1 reference only.
I did a test:
> curl -I https://www.w3.org/People/Ivan/Tests/credentials/v1
HTTP/2 200
date: Thu, 18 Jul 2024 09:52:30 GMT
content-type: application/ld+json
content-length: 7687
content-location: v1.jsonld
vary: negotiate
tcn: choice
last-modified: Thu, 18 Jul 2024 09:51:29 GMT
etag: "1e07-61d8285dbea40;61d8285fab840
cache-control: max-age=2592000
expires: Sat, 17 Aug 2024 09:52:30 GMT
strict-transport-security: max-age=15552000; includeSubdomains; preload
content-security-policy: frame-ancestors 'self' https://cms.w3.org/ https://cms-dev.w3.org/; upgrade-insecure-requests
cf-cache-status: BYPASS
accept-ranges: bytes
Are we o.k. with this? I can then change the "real" access file.
I would welcome comments/tests before doing so.
Are we o.k. with this? I can then change the "real" access file.
For now, yes, I think that's fine. Thank you for working on that @iherman!
We could quibble on what the cache time should be... e.g., "Should it be a year?" ... "What about for context files in development... a day?... an hour?" ... but that's a separate issue and we might want to have a WG discussion around best practices for context file development.
For now, with the application of your proposed fix @iherman, I think we can close this issue.
For now, with the application of your proposed fix @iherman, I think we can close this issue.
I would still prefer to get feedback from @davidlehn first. If he is o.k., I can then do all this on the official site.
We could quibble on what the cache time should be... e.g., "Should it be a year?" ... "What about for context files in development... a day?... an hour?" ... but that's a separate issue and we might want to have a WG discussion around best practices for context file development.
But that is the problem. As I said in https://github.com/w3c/vc-data-model/issues/1239#issuecomment-2235308437, it does not seem to be possible to set the expiration per file. This means this expiration becomes valid for all jsonld files in the directory, which includes the vocabulary.jsonld
. As this is still not final, we should not give it a very long expiration date.
But yes, we should have a clear discussion on final URLs, storage of files, and expiration dates for all our files. At TPAC?
The syntax examples show singular words are allowed. So in the case of "1" month, how about:
-ExpiresByType application/ld+json "access plus 1 months"
+ExpiresByType application/ld+json "access plus 1 month"
I'm not sure what that stackoverflow link above was about. It looks like you can do per-file config. I had to learn and test this. The "mod_expires" docs say the directives "Context" has "directory", which apparently allows <Files>
, <FilesMatch>
and similar. So we can control behavior of expires headers (or other config) for the v1 context alone, or for other resources by name or extension or location. It seems to work in a local test. As a bogus example:
AddType application/ld+json .jsonld
AddType text/turtle .ttl
<FilesMatch ".+\.(jsonld)$">
ExpiresActive On
ExpiresByType application/ld+json "access plus 1 month"
</FilesMatch>
<FilesMatch ".+\.(ttl)$">
ExpiresActive On
ExpiresByType text/turtle "access plus 1 year"
</FilesMatch>
Thanks @davidlehn.
What I ended up doing, based on your experimentation, is this:
<FilesMatch "v1.jsonld">
ExpiresActive On
ExpiresDefault "access plus 1 month"
</FilesMatch>
and it looks like this worked for v1.jsonld
exclusively.
@msporny @davidlehn
Here is what I have done:
v1
from the GitHub repository to the W3C server at /2018/credentials/v1.jsonld
examples/v1
from the GitHub repository to the W3C server at /2018/credentials/examples/v1.jsonld
.htaccess
for both directories; the one in the credentials/examples/
is below (the one /credentials/
is a bit longer because it includes additional statements that are not relevant for now):RewriteEngine On
RewriteBase /2018/credentials/examples/
AddType application/ld+json .jsonld
AddType text/turtle .ttl
ExpiresActive On
<FilesMatch "v1.jsonld">
ExpiresActive On
ExpiresDefault "access plus 1 month"
</FilesMatch>
RewriteRule ^v1$ v1.jsonld [E=json,P]
Header set Content-Type application/ld+json env=json
As, thanks to @davidlehn, we could set the expiration date on a single file, I was wondering whether a 1-month expiration date is indeed fine, or we would prefer to make it longer. You tell me...
I made some tests, and it looks o.k., but I would welcome you guys to take a look. If everything is fine with you, we can close this issue (at last!).
As an aside, the v1 context is now 100% stable and is not dependent on GitHub anymore. The pattern to follow for v2 when the time comes.
I was wondering whether a 1-month expiration date is indeed fine, or we would prefer to make it longer. You tell me...
My preference for this sort of thing is usually no more than 24 hours, to allow for various unanticipated mistakes to be corrected within a day. I don't think having an expiration of significantly longer necessarily provides that much benefit, but I don't have any data either. In short, I certainly wouldn't advise going even longer.
The latest fetch shows that the expires header has been fixed (with a 1 month cache period):
$ curl -I https://www.w3.org/2018/credentials/v1
HTTP/2 200
date: Sun, 21 Jul 2024 14:45:31 GMT
content-type: application/ld+json
content-length: 7687
content-location: v1.jsonld
vary: negotiate
tcn: choice
last-modified: Fri, 19 Jul 2024 06:24:51 GMT
etag: "1e07-61d93c0b8d2c0;61d93f37f74bd
cache-control: max-age=2592000
expires: Tue, 20 Aug 2024 14:45:31 GMT
x-backend: www-mirrors
x-request-id: 8a6bf9a54a87c5c4
strict-transport-security: max-age=15552000; includeSubdomains; preload
content-security-policy: frame-ancestors 'self' https://cms.w3.org/ https://cms-dev.w3.org/; upgrade-insecure-requests
cf-cache-status: BYPASS
accept-ranges: bytes
set-cookie: __cf_bm=DrvvSarwa00dPec1.tlrMCH1OXkp7Y6VmUvEGPm5CW0-1721573131-1.0.1.1-YQDTNPN_8HfjISzez9PyjeIfRvRkSOA5vuJ3SOGcVMisr197miHkPqQOpTWrWW7aEwfQKJI0UEqXEd0SeGE89g; path=/; expires=Sun, 21-Jul-24 15:15:31 GMT; domain=.w3.org; HttpOnly; Secure; SameSite=None
server: cloudflare
cf-ray: 8a6bf9a54a87c5c4-IAD
alt-svc: h3=":443"; ma=86400
Closing.
[...] ExpiresActive On <FilesMatch "v1.jsonld"> ExpiresActive On ExpiresDefault "access plus 1 month" </FilesMatch> [...]
@iherman:
ExpiresActive On
might not be needed for the whole file? Though I think it does nothing unless ExpiresDefault
or ExpiresByType
is used.FilesMatch
is a regex, so "v1.jsonld"
will match that anywhere, like "foov1.jsonld". It might not matter at the moment but may be better to be explicit and use Files
or restrict like "^v1.jsonld$".Thanks @davidlehn, I have made those modifications! Can you check again to be absolutely sure?
Below is a screenshot of the response headers I received when looking up https://www.w3.org/2018/credentials/v1.
The
expires
header is earlier than the date header which means that the document is not being cached by my browser - and hence the document is taking several hundred ms on each request rather than only the first request taking that long.This can significantly slow down the time it takes to parse VCs as JSON-LD.
Headers:
Timing:
Since this context is presumably quite stable I would request that the document be given a fairly long expiry after the date it is requested (at minimum 1 day).