karpetrosyan / hishel

An elegant HTTP Cache implementation for HTTPX and HTTP Core.
https://hishel.com
BSD 3-Clause "New" or "Revised" License
158 stars 19 forks source link

Fix revalidation storage (#239) #240

Closed hartym closed 2 months ago

hartym commented 2 months ago

Attempt to fix #239 (only async code, for now)

Before further work (aka cleanup, sync version, more tests maybe, maybe split tests ...), I'd like to validate a few things:

karpetrosyan commented 2 months ago

I created a new test file. Should I keep it so or should I merge it in transport tests ?

Let's add them into the tests/_async/test_transport.py and tests/_async/test_pool.py

I mocked the clock using the test library "freezegun" which I'm used to and that did not require to understand the MockedClock behaviour. I don't know if having this (which is really handy ...) as a test dependencey would be ok or not

Actually, the idea behind using a clock was to avoid relying on third-party libraries solely for mocking the underlying time. Instead, we opted for dependency injection, which, in my opinion, is more explicit. So, let's use the clock for now. The same applies to response mocking. Let's use the mechanism that is employed in other tests, utilizing the built-in MockTransport.

Should I split the different test cases ?

I guess we only need a single test, for this case.

Do you prefer very explicit test code with duplicates, or should I factor the stored response extraction logic in a local function ?

I prefer explicit test cases.

Comments

We also support HTTPCore connection pools as an alternative to high-level HTTPX transports. Therefore, the same changes should be made for hishel/_async/_pool.py and tests/_async/test_pool.py.

hartym commented 2 months ago

Noted, I'll apply the changes once I get a new time slot for this (probably this evening).

karpetrosyan commented 2 months ago

Also, you can refer to the contribution documentation. The maintenance process for hishel can be a bit confusing. We focus solely on writing asynchronous code, and the synchronous code is generated automatically. More information can be found in the documentation.

hartym commented 2 months ago

Also, you can refer to the contribution documentation. The maintenance process for hishel can be a bit confusing. We focus solely on writing asynchronous code, and the synchronous code is generated automatically. More information can be found in the documentation.

Thanks. I did it first once complaining silently about why the heck the check script does not give me line numbers, but got it now.

I only have the pool version remaining and I think we'll be good to go. The autoincrementing Clock was a bit confusing as I do not really know how many times the clock is called by my test code, so I implemented one following the same logic but with a manual increment switch.

karpetrosyan commented 2 months ago

Oh, I just realized that we need to remove the cache entry if the 304 response is not cacheable. However, we currently do not support an API for removing entries from the storage. There's an ongoing issue for introducing that API

I will work on it. Once done, we can add the removal part for cases where the 304 response contains something like no-store, and then get this merged.

hartym commented 2 months ago

I think it's done (for this bug, not the additional issue you mentioned). Ready for final review :)

hartym commented 2 months ago

(of course, when it's ok, you can squash all the noise I generated)

hartym commented 2 months ago

Squashed my commits to remove my "out of conventions" commits from the history, merged upstream release.

karpetrosyan commented 2 months ago

Do we really have unexpected behavior here? I'm not fully understanding what we are trying to fix.

hartym commented 2 months ago

Do we really have unexpected behavior here? I'm not fully understanding what we are trying to fix.

My PR is ready, as far as I know.

I do have the unexpected behaviour described by the issue before my patch (and new tests agree with it).

I don't have it anymore after the patch (and new tests pass).

For me, the PR is ready for your review and merge, let me know if there is a misunderstanding somewhere.

(note that I did not consider at all the things about cache storage invalidation, as it was not present before. I did not look into your patches yet)

hartym commented 2 months ago

Before patch:

GET /cache/2 -> 200 with cache-control: public, max-age=2
GET /cache/2 -> response from cache
... then 2 second later ...
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
(cache never used there anymore)

After patch:

GET /cache/2 -> 200 with cache-control: public, max-age=2
GET /cache/2 -> response from cache
... then 2 second later ...
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
GET /cache/2 -> response from cache
karpetrosyan commented 2 months ago

Before patch:

GET /cache/2 -> 200 with cache-control: public, max-age=2
GET /cache/2 -> response from cache
... then 2 second later ...
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
(cache never used there anymore)

After patch:

GET /cache/2 -> 200 with cache-control: public, max-age=2
GET /cache/2 -> response from cache
... then 2 second later ...
GET /cache/2 -> 200 with cache-control: public, max-age=2 (revalidation)
GET /cache/2 -> response from cache

Can we isolate this case? I have tried but seems, everything work fine.

hartym commented 2 months ago

It is isolated in the test cases added in the pull request. If you need, I can split the test but you said that's not necessary. The test fails before the patch.

https://github.com/karpetrosyan/hishel/blob/e447d22c9f91f96fe0ed189d1a2697dbc070627a/tests/_async/test_transport.py#L360

hartym commented 2 months ago

Sorry, the line failing is a bit below, it may look like it's just a timestamp mistake, but it's not : we got the timestamp from the third mocked response although the code expects that the current version is the second one : there should have been one initial request at this point, and one revalidation, but it sent 2 revalidation (+ initial, it gets the third response from mock)

https://github.com/karpetrosyan/hishel/blob/e447d22c9f91f96fe0ed189d1a2697dbc070627a/tests/_async/test_transport.py#L366

The RFC 9111 says about the 3rd handle_async_request call of this test :

A full response (i.e., one containing content) indicates that none of the stored responses nominated in the conditional request are suitable. Instead, the cache MUST use the full response to satisfy the request. The cache MAY store such a full response, subject to its constraints.

The MAY here refers to the section 3 constraints, meaning that if the revalidation request is cacheable, it should be cached.

The user-facing issue is that for requests going to this point, before patch, we always have old versions of the cache, meaning that after the initial expiration the revalidation request will be sent everytime, forever.

karpetrosyan commented 2 months ago

Let's look at https://github.com/karpetrosyan/hishel/issues/239#issuecomment-2179958037, seems like it should fail, but it works.

hartym commented 2 months ago

I missed this one, I need to got out now but will try that test as soon as I can.

hartym commented 2 months ago

Wrote isolated test case in https://github.com/karpetrosyan/hishel/issues/239#issuecomment-2188093351

karpetrosyan commented 2 months ago

Wrote isolated test case in #239 (comment)

Let's fix test case in this PR, and I think we can get this merged.

hartym commented 2 months ago

Test fixed.

hartym commented 2 months ago

Fixes based on your comments, I removed the double metadata creation in pool and transport that was hanging around (the store method does it, already). I hope I did not miss something there, let me know.

karpetrosyan commented 2 months ago

Thanks for the PR :pray: It's very neat!