metacpan / metacpan-web

Web interface for MetaCPAN
http://metacpan.org
Other
418 stars 237 forks source link

Some packages disappear and reappear from the site and elasticsearch randomly #2275

Open AMDmi3 opened 4 years ago

AMDmi3 commented 4 years ago

I'm maintainer of Repology which uses package infrormation from metacpan, and I've been investigating a Qt package blinking in the history which Repology maintains: https://repology.org/project/perl:qt/history

It turned out that the package blings on the Metacpan site as well: I've set up logging of HTTP replies for https://metacpan.org/release/Qt, and has captured a single case of the problem (showing only the requests adjacent to the moments when it disappears and reappears):

...
HTTP/2 200 
server: nginx
content-type: text/html; charset=utf-8
cache-control: max-age=3600
last-modified: Sat, 19 Mar 2011 19:31:28 GMT
x-runtime: 0.584421
via: 1.1 varnish
accept-ranges: bytes
date: Fri, 13 Mar 2020 22:10:34 GMT
via: 1.1 varnish
age: 1559
x-served-by: cache-lax8643-LAX, cache-ams21028-AMS
x-cache: MISS, HIT
x-cache-hits: 0, 2
x-timer: S1584137434.442652,VS0,VE0
vary: Accept-Encoding
content-length: 97820

HTTP/2 404 
server: nginx
content-type: text/html; charset=utf-8
cache-control: private
x-runtime: 0.112115
accept-ranges: bytes
age: 0
accept-ranges: bytes
via: 1.1 varnish
age: 0
accept-ranges: bytes
age: 0
accept-ranges: bytes
date: Fri, 13 Mar 2020 22:11:34 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-lax8642-LAX, cache-ams21045-AMS
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1584137494.376470,VS0,VE404
vary: Accept-Encoding
content-length: 13698
...
HTTP/2 404 
server: nginx
content-type: text/html; charset=utf-8
cache-control: private
x-runtime: 0.037561
accept-ranges: bytes
accept-ranges: bytes
via: 1.1 varnish
age: 0
accept-ranges: bytes
accept-ranges: bytes
date: Fri, 13 Mar 2020 23:00:34 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-lax8646-LAX, cache-ams21049-AMS
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1584140435.644706,VS0,VE337
vary: Accept-Encoding
content-length: 13698

HTTP/2 200 
server: nginx
content-type: text/html; charset=utf-8
cache-control: max-age=3600
last-modified: Sat, 19 Mar 2011 19:31:28 GMT
x-runtime: 1.284640
accept-ranges: bytes
via: 1.1 varnish
age: 0
accept-ranges: bytes
accept-ranges: bytes
date: Fri, 13 Mar 2020 23:01:36 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-lax8625-LAX, cache-ams21077-AMS
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1584140494.440345,VS0,VE1833
vary: Accept-Encoding
content-length: 97820
...
oalders commented 4 years ago

Tnat's disturbing. I've cleared the cache for this URL:

$ curl -XPURGE https://metacpan.org/release/Qt
{"status": "ok", "id": "17360-1570081162-188528869"}

Does this change the cache misses?

AMDmi3 commented 4 years ago

I'll continue monitoring it - as you can see from the history link above, this happens sporadically and some days may pass between problem manifestations.

However I'm not sure if it's related to HTTP caching, as Repology fetches data from ElasticSearch (https://fastapi.metacpan.org/v1/release/_search), and the package blinks there as well. It's not the only one, here's top 20:

                effname                 | count 
----------------------------------------+-------
 perl:pcore-captcha                     |   188
 perl:mojolicious-plugin-staticshare    |    55
 perl:do                                |    52
 perl:qt                                |    49
 perl:afs                               |    40
 perl:net-fileshare                     |    38
 perl:data-object-immutable             |    32
 perl:data-object                       |    32
 perl:pcore                             |    31
 perl:catmandu-fix-datahub              |    29
 perl:dist-zilla-plugin-author-plicease |    22
 perl:pcore-cdn-static                  |    21
 perl:spvm                              |    21
 perl:template-xml                      |    21
 perl:moox-press                        |    16
 perl:data-edit-xml                     |    15
 perl:pdla-core                         |    15
 perl:data-table-text                   |    15
 perl:moox-pression                     |    15
 perl:net-domain-info                   |    15
oalders commented 4 years ago

Are you able to share a query which is returning inconsistent results?

AMDmi3 commented 4 years ago

Something like this:

POST https://fastapi.metacpan.org/v1/release/_search?scroll=1m
{
   "fields" : [
      "abstract",
      "author",
      "distribution",
      "download_url",
      "license",
      "maturity",
      "resources.homepage",
      "status",
      "version",
      "name"
   ],
   "filter" : {
      "or" : [
         {
            "term" : {
               "status" : "latest"
            }
         },
         {
            "term" : {
               "maturity" : "developer"
            }
         }
      ]
   },
   "size" : 5000
}

Then I iterate the scroll, then DELETE it.

Again, I'm not sure the query matters here, as the package blinks BOTH in the query results and on the site.

oalders commented 4 years ago

The site uses the public API in much the same way, so a focus on the query brings us closer to the root of the problem.

oalders commented 4 years ago

Out of curiosity, why search without any query parameters aside from status and maturity?

AMDmi3 commented 4 years ago

For Repology I need information on latest stable and development versions of all available modules. I don't remember the details already, but I guess "status" : "latest" returns the former, and "maturity" : "developer" returns all developement versions which I then process to get the latest one for each module.