readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
8.04k stars 3.58k forks source link

.inv files are over-cached? #7590

Closed nedbat closed 4 years ago

nedbat commented 4 years ago

Details

Expected Result

We have a multi-book docset, with references between the books. We use an intersphinx_mapping to resolve these references:

intersphinx_mapping = {
    'opencoursestaff' : ('https://edx.readthedocs.io/projects/open-edx-building-and-running-a-course/en/latest/', None),
    'data' : ('https://edx.readthedocs.io/projects/devdata/en/latest/', None),
    'partnercoursestaff': ('https://edx.readthedocs.io/projects/edx-partner-course-staff/en/latest/', None),
    'insights' : ('https://edx.readthedocs.io/projects/edx-insights/en/latest/', None),
    'xblockapi' : ('https://edx.readthedocs.io/projects/xblock/en/latest/', None),
    'xblocktutorial' : ('https://edx.readthedocs.io/projects/xblock-tutorial/en/latest/', None),
    'installation' : ('https://edx.readthedocs.io/projects/edx-installing-configuring-and-running/en/latest/', None),
    'olx' : ('https://edx.readthedocs.io/projects/edx-open-learning-xml/en/latest/', None),
    'learners' : ('https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/', None),
    'openlearners' : ('https://edx.readthedocs.io/projects/open-edx-learner-guide/en/latest/', None),
    'opendevelopers' : ('https://edx.readthedocs.io/projects/edx-developer-guide/en/latest/', None),
    'opendataapi' : ('https://edx.readthedocs.io/projects/edx-data-analytics-api/en/latest/', None),
    'openreleasenotes' : ('https://edx.readthedocs.io/projects/open-edx-release-notes/en/latest/', None),
    'partnerreleasenotes': ('https://edx.readthedocs.io/projects/edx-release-notes/en/latest/', None),
    '2014releasenotes' : ('https://edx.readthedocs.io/projects/edx-2013-2014-release-notes/en/latest/', None),
    'retirement' : ('https://user-retirement-guide.readthedocs.io/en/latest/', None),
}

(The "learners" reference is to a particular branch so that my new references will be found as I work on the pull request.)

When I update one book, I find the next local build still can't resolve the new references. Using wget to pull the .inv file, it seems that the old content is still being served, but if I add a nonsense query parameter, I get the new content:

$ wget -q -O - https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv | wc -c
    2923
$ wget -q -O - https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv\?bust | wc -c
    3821
$ wget -q -O - https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv | wc -c
    2923

I saw the problem yesterday, and then new references seemed to resolve this morning, so it seems that waiting long enough will bring the new content, as if a cache expires. I don't know how long that time is.

nedbat commented 4 years ago

It looks like the cache expired after 90 minutes.

humitos commented 4 years ago

Thanks for the report. The cache for a particular version should be purged immediately (couple of minutes, not 90 as you mentioned) after the build for that version succeed.

We will need to have more debugging information here to be able to find out the reason. Anything else that you can provide? Can you do some other tests here and check the HTTP headers: CF-Cache-Status and CF-RAY before and after a rebuild?

We have an alarm in our system that checks the latency since the build was triggered until the new content is published and online, and we are not noticing this 90 minutes cached delay that you are mentioning.

nedbat commented 4 years ago

Here is header information:

Requesting objects.inv after 90 minutes have passed since the build:

HTTP/1.1 200 OK
Date: Thu, 22 Oct 2020 14:54:23 GMT
Content-Type: application/octet-stream
Content-Length: 3821
Connection: keep-alive
Set-Cookie: __cfduid=dcb621f5f1a9b8070f8b445100fce9c521603378463; expires=Sat, 21-Nov-20 14:54:23 GMT; path=/; domain=.readthedocs.io; HttpOnly; SameSite=Lax
Last-Modified: Thu, 22 Oct 2020 11:40:29 GMT
ETag: 0x8D8767F4725A1C7
x-ms-request-id: 045bc86b-f01e-00a0-737d-a8925d000000
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Access-Control-Allow-Origin: *
X-Served: Nginx-Proxito-Sendfile
X-Backend: web00020d
X-RTD-Project: edx
X-RTD-Version: nedbat-fix-learner-links
X-RTD-Path: /proxito/media/html/edx-guide-for-students/nedbat-fix-learner-links/objects.inv
X-RTD-Domain: edx.readthedocs.io
X-RTD-Version-Method: path
X-RTD-Project-Method: subdomain
Referrer-Policy: no-referrer-when-downgrade
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
CF-Cache-Status: HIT
Age: 2665
Expires: Thu, 22 Oct 2020 15:54:23 GMT
Cache-Control: public, max-age=3600
Accept-Ranges: bytes
cf-request-id: 05f266da130000ec00f9804000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 5e640da3593fec00-BOS

Requesting objects.inv right after a build, the old content is served:

HTTP/1.1 200 OK
Date: Thu, 22 Oct 2020 15:06:06 GMT
Content-Type: application/octet-stream
Content-Length: 3821
Connection: keep-alive
Set-Cookie: __cfduid=d6c8d5fcfb5043867daaccb0bc9bd9bed1603379166; expires=Sat, 21-Nov-20 15:06:06 GMT; path=/; domain=.readthedocs.io; HttpOnly; SameSite=Lax
Last-Modified: Thu, 22 Oct 2020 11:40:29 GMT
ETag: 0x8D8767F4725A1C7
x-ms-request-id: 045bc86b-f01e-00a0-737d-a8925d000000
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Access-Control-Allow-Origin: *
X-Served: Nginx-Proxito-Sendfile
X-Backend: web00020d
X-RTD-Project: edx
X-RTD-Version: nedbat-fix-learner-links
X-RTD-Path: /proxito/media/html/edx-guide-for-students/nedbat-fix-learner-links/objects.inv
X-RTD-Domain: edx.readthedocs.io
X-RTD-Version-Method: path
X-RTD-Project-Method: subdomain
Referrer-Policy: no-referrer-when-downgrade
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
CF-Cache-Status: HIT
Age: 3368
Expires: Thu, 22 Oct 2020 16:06:06 GMT
Cache-Control: public, max-age=3600
Accept-Ranges: bytes
cf-request-id: 05f271968f0000ebf8d485c000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 5e641ed0e8b6ebf8-BOS

Requesting objects.inv with a query param to bust the cache:

HTTP/1.1 200 OK
Date: Thu, 22 Oct 2020 15:03:59 GMT
Content-Type: application/octet-stream
Content-Length: 4055
Connection: keep-alive
Set-Cookie: __cfduid=d7fea1b0997ac260f34bb8b325d9b83751603379039; expires=Sat, 21-Nov-20 15:03:59 GMT; path=/; domain=.readthedocs.io; HttpOnly; SameSite=Lax
Last-Modified: Thu, 22 Oct 2020 15:00:42 GMT
ETag: 0x8D8769B3F4E8D34
x-ms-request-id: 09f67383-601e-00c8-5a84-a8cc0c000000
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Access-Control-Allow-Origin: *
X-Served: Nginx-Proxito-Sendfile
X-Backend: web00020d
X-RTD-Project: edx
X-RTD-Version: nedbat-fix-learner-links
X-RTD-Path: /proxito/media/html/edx-guide-for-students/nedbat-fix-learner-links/objects.inv
X-RTD-Domain: edx.readthedocs.io
X-RTD-Version-Method: path
X-RTD-Project-Method: subdomain
Referrer-Policy: no-referrer-when-downgrade
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
CF-Cache-Status: MISS
Expires: Thu, 22 Oct 2020 16:03:59 GMT
Cache-Control: public, max-age=3600
Accept-Ranges: bytes
cf-request-id: 05f26fa5590000ebf04829a000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 5e641bb55fa1ebf0-BOS
nedbat commented 4 years ago

A more careful test showed that wget got updated content an hour after the build. That meshes well will the "Cache-Control: public, max-age=3600" header.

humitos commented 4 years ago

I think I found the issue and I'd say that's a bug in our task that purge the cache. It gets confused when generating tags cache for subprojects.

For your particular case, when hitting edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv we generate the HTTP header Cache-Tag: edx,edx-nedbat-fix-learner-links (source code), but then when the build finishes and we call the purge task, we are purging edx-guide-for-students-nedbat-fix-learner-links (source code in private repo).

The Cache-Tag that we are generating is wrong, because it uses the slug for the main project instead of the subproject.

humitos commented 4 years ago

@nedbat we just deployed the fix for this, could you confirm us that this is working as you expected now and the cache is being invalidated "immediately" (minutes) after the build is done? Thanks!

nedbat commented 4 years ago

I tested it, and it works! My branch got an updated objects.inv right away. Thanks!