vitalets / playwright-network-cache

Cache and mock network requests in Playwright
MIT License
27 stars 2 forks source link

Support ETags #4

Open jan-molak opened 2 weeks ago

jan-molak commented 2 weeks ago

Hi @vitalets! Thanks for your work on playwright-network-cache; it looks very promising already!

Have you considered making the caching mechanism aware of the ETag header? This would allow consumers to specify a longer TTL that could be reduced should the target content change before the TTL has expired.

I think this could be accomplished by making the CacheRouteHandler check if the existing local cache has expired using the current algorithm, and if not, make a HEAD request to the requested original URL and check if the ETag header has changed since the last time the local time was populated and the local headers.json file was created (which file should also contain the previous version of the ETag)

I think that conceptually it would be similar to isUpdated check, but done before the "real" request is made.

vitalets commented 2 weeks ago

Hey @jan-molak Agreed, that is an interesting idea. So, you mean the following scenario: For example, I enable caching for GET /api/cats and set TTL = 1 hour. On the first request ETag is saved to headers.json. Then, all subsequent requests during 1 hour are served from local cache (as it already does currently). After 1 hour, each request performs preliminary HEAD request and if received ETag differs from the saved one, performs a real GET request and updates cache.

I also thought about cache-control header, it can be utilized as well to prolong cache time. Actually, it is the re-implementation of caching in browser, but why not =)

I'd suggest to let user explicitly enable that behavior (to keep things straightforward by default):

await cacheRoute.GET('/api/cats', { 
  ttlMinutes: 60,
  respectETag: true, // <- keeps cache until ETag changes
});
jan-molak commented 2 weeks ago

Then, all subsequent requests during 1 hour are served from local cache (as it already does currently). After 1 hour, each request performs preliminary HEAD request and if received ETag differs from the saved one, performs a real GET request and updates cache.

I was thinking that maybe we could use ETags to expire the cache sooner than the TTL would require. So for example, you set the TTL to a "long time" such as a day or a week. If the option to respect ETags is enabled, then every request performs a preliminary HEAD call to see if the ETag has changed. If it it has, a "real" request is made, if not - the response is retrieved from the cache.

vitalets commented 2 weeks ago

I was thinking that maybe we could use ETags to expire the cache sooner than the TTL would require.

Hmm, that's the opposite one. But making these preliminary requests in every test - wouldn't it slow them down? Even though it's HEAD, it's a network roundtrip anyway and I suppose many APIs will respond to HEAD with significant time as well.

jan-molak commented 2 weeks ago

wouldn't it slow them down?

I think it depends on the use case. In our case, we're building a simple Playwright Test and Serenity/JS-based website crawler. We are exploring using playwright-network-cache to cache API responses and static assets to avoid loading them unless changed. The APIs correctly handle ETags and HEAD requests, so while making a HEAD request incurs a network cost, it's still significantly faster than making a GET request since the response body can be large.

Of course, not all APIs correctly handle ETags and HEAD requests, so having an explicit setting to respectETag that you proposed is a good approach.

vitalets commented 2 weeks ago

to cache API responses and static assets to avoid loading them unless changed

Maybe in that case TTL should not be set at all? You just use cached data until ETag changes?

I've compared it with HTTP caching in different situations. Your suggestion is more like cache-control: no-cache, that means data is cached but must be re-validated before each use. Having TTL set - is more like cache-control: max-age=3600- during that period browser uses cache without contacting the server.