rubygems / rubygems.org

The Ruby community's gem hosting service.
https://rubygems.org
MIT License
2.33k stars 924 forks source link

Compact index /versions takes a long time to update #1877

Closed jhawthorn closed 1 year ago

jhawthorn commented 5 years ago

I've noticed a very long delay the last few times I've published a gem.

I pushed a gem 45 minutes ago

but it still isn't showing up in the /versions compact index.

$ date --utc
Sun Dec 30 01:49:54 UTC 2018

$ curl https://rubygems.org/versions | tail
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20.8M  100 20.8M    0     0  3767k      0  0:00:05  0:00:05 --:--:-- 3619k
webscraper_framework 0.1.733 8dc7114bd166472d0279f1af7eb3f21e
lazy_navigator 1.5.0 fca5d5380ac977451acaa69f8cc1356a
roebe 0.3.196 f2e3e83eb7539a5f760439d6654a3511
asciidoctor-texnical 0.0.2 d6f5cb3bae4fedfc24ad8288c882bc58
ruby_language_server 0.2.0 a9b0c974bfbbffff604e3dbdf1f1238f
roebe 0.3.197 da3c7faac8f2437c74f2eb2fab09f886
wavefront-cli 2.16.2 786861dda86e48cd06797d0623ef4271
pry-rails 0.3.9 c8f0e3dfa9fdde1c003a1f98c1cdb6d9
zengin_code 1.0.1.20181230 1152118736597f40c0a72be68d9c2927
rbt 0.5.182 906e1ec64b31de2439bdc297b393e934

The last gem listed is over an hour old. It's missing (based on https://twitter.com/rubygems) 4 gems published since then.

However the info file looks good (but maybe the cache just wasn't warm here?)

$ curl https://rubygems.org/info/pub_grub
---
0.1.0 |checksum:5bca4f8f7de694cc7aca06a75c83635fac518aa09472c94f5041aa9886831f90
0.2.0 |checksum:7f640a86390da06fc81f10a97b3bcd86feeeb2a9e3d1f1d5dc054125d5d87b67
0.3.0 |checksum:090e3ce8ebfd3bc25ce463b2e260fbc0f5c29bc24a6ad12eb0880b343d25f20c
0.3.1 |checksum:51fa1f71d71adb3f8419eb507287278582441e1ba34318cb9723ab72bfad7c59
0.3.2 |checksum:76cec92ad66432c05b774b376a667e4204762bf6562d3cf76c786779264a346f
0.4.0 |checksum:e262a7c1d992946f931384e75c5dc490bf92d5eedf6149540b2b3a2a73b7f0a4
0.4.1 |checksum:df0fe78746f4ad956ce1ecff4969fb3eea43da77e24f6ae252e0de144e265c85

Is fastly caching this too aggressively? It looks like there's code to bust this cache?


It seems like it updated finally right after the hour. Maybe this happens in a cron job? or maybe that's when the cache expires?

$ date --utc
Sun Dec 30 02:01:05 UTC 2018

$ curl https://rubygems.org/versions | tail
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20.8M  100 20.8M    0     0  3538k      0  0:00:06  0:00:06 --:--:-- 3460k
ruby_language_server 0.2.0 a9b0c974bfbbffff604e3dbdf1f1238f
roebe 0.3.197 da3c7faac8f2437c74f2eb2fab09f886
wavefront-cli 2.16.2 786861dda86e48cd06797d0623ef4271
pry-rails 0.3.9 c8f0e3dfa9fdde1c003a1f98c1cdb6d9
zengin_code 1.0.1.20181230 1152118736597f40c0a72be68d9c2927
rbt 0.5.182 906e1ec64b31de2439bdc297b393e934
pub_grub 0.4.1 dd3f1c6b5a3c1f3d09448a4ac78d6909
faastruby 0.3.6 c6a976154e5cc5e6bab8479373b29ea2
faastruby 0.3.7 db527b2dc82530486cb56264eaa6ace2
legion-settings 0.1.0 d41819b570c7b81685a56a0cfd8a22d5
dwradcliffe commented 5 years ago

Thanks for the detailed report. The purge should happen nearly instantly (it’s async but not actually delayed). I have a hunch it’s related to the domain. Can you try again and see if you see it right away on https://api.rubygem.org/versions?

matthewd commented 5 years ago

Yeah, I ran into this previously: #1698

The cache-busting code notifies fastly about index.rubygems.org (which per comment in the above, Bundler uses for historical reasons), but not plain rubygems.org (which is, AIUI, the per-spec place a client should look if that's the URL it was originally told).

If it's possible, it does seem likely to prevent future confusion if the server could invalidate both (all?) hostnames.

dwradcliffe commented 5 years ago

Yeah, we should refactor that code to purge a list of domains (list is different for staging and production).

dwradcliffe commented 5 years ago

Oh look I started fixing this a long time ago: https://github.com/rubygems/rubygems.org/compare/fastly_multiple_domains

jhawthorn commented 5 years ago

Oh look I started fixing this a long time ago

Awesome!

I had a chance to do some testing and the index ended up outdated on all three domains.

However index.rubygems.org, the one we expect to be purged, is out of date in a particular way. Every time a package is pushed the previous package appears at https://index.rubygems.org/versions, but not the one which was just pushed. I saw this happen three times in a row (with both my and others gems).

It feels like the purge is happening before the data is in the database. Is it possible the job gets queued and run while a DB transaction is still open and therefore the data is missed on the purge?

dwradcliffe commented 5 years ago

Is it possible the job gets queued and run while a DB transaction is still open?

Yes.

The other thing might be nginx cache. I think it’s caching for 60 seconds.

JacobEvelyn commented 5 years ago

This seems like it might be the same issue I'm running into (details in https://github.com/rubygems/rubygems/issues/2594). I recently published a new version of a gem and gem search and gem outdated didn't have the new version for over 17 hours.

dwradcliffe commented 5 years ago

We should be purging all the right places now, are you still seeing this?

jhawthorn commented 5 years ago

Interestingly, this looks better, but index.rubygems.org still isn't able to pick up the most recent package. It's getting its cache purged, but the update only includes previous packages, not the current push.

$ for url in https://rubygems.org/versions https://index.rubygems.org/versions https://api.rubygems.org/versions ; do echo "$url"; curl -s "$url" | tail -3; echo; done
https://rubygems.org/versions
mangdown 0.20.4 72aa507ad95ee746610353ab2157b285
mangdown 0.20.5 bc65056cada0feb12e22473eb64ac33e
mangdown 0.20.6 e8df512dfaa8312f96642ddd7ccdc7b5

https://index.rubygems.org/versions
roebe -0.3.329 557995f5545d3ab8bb60ae1330edfc62
mangdown 0.20.4 72aa507ad95ee746610353ab2157b285
mangdown 0.20.5 bc65056cada0feb12e22473eb64ac33e

https://api.rubygems.org/versions
mangdown 0.20.4 72aa507ad95ee746610353ab2157b285
mangdown 0.20.5 bc65056cada0feb12e22473eb64ac33e
mangdown 0.20.6 e8df512dfaa8312f96642ddd7ccdc7b5

$ gem push pkg/hawthtest-0.1.2.gem
Pushing gem to https://rubygems.org...
You have enabled multi-factor authentication. Please enter OTP code.
Code:   <REDACTED>
Successfully registered gem: hawthtest (0.1.2)

$ for url in https://rubygems.org/versions https://index.rubygems.org/versions https://api.rubygems.org/versions ; do echo "$url"; curl -s "$url" | tail -3; echo; done
https://rubygems.org/versions
mangdown 0.20.5 bc65056cada0feb12e22473eb64ac33e
mangdown 0.20.6 e8df512dfaa8312f96642ddd7ccdc7b5
hawthtest 0.1.2 df45a60b8e7be1fa7f69325ca51b6ac6

https://index.rubygems.org/versions
mangdown 0.20.4 72aa507ad95ee746610353ab2157b285
mangdown 0.20.5 bc65056cada0feb12e22473eb64ac33e
mangdown 0.20.6 e8df512dfaa8312f96642ddd7ccdc7b5

https://api.rubygems.org/versions
mangdown 0.20.5 bc65056cada0feb12e22473eb64ac33e
mangdown 0.20.6 e8df512dfaa8312f96642ddd7ccdc7b5
hawthtest 0.1.2 df45a60b8e7be1fa7f69325ca51b6ac6
dwradcliffe commented 5 years ago

Ok the problem must be the nginx cache then. I’ll need to look at this more to figure out a solution.

dwradcliffe commented 5 years ago

Seems like we are not effectively caching this endpoint in Fastly, which is why we need the nginx cache. (I tried removing the nginx cache and that was bad.) I'll continue to work on fixing the cache settings.

sonalkr132 commented 3 years ago

Hi, sorry about the delay in getting this resolved. We have deployed a few changes to make this situation better (see linked PRs above). Changes should now be reflected on versions endpoint in under a minute. Please let us know if you continue to see long delays (previously up to an hour). The reason for the delay (even of one minute) remains the same, the necessity of cache on nginx for this endpoint. I have suggested an optimization of our postgres config and I am waiting to hear back from the team on it. We will try removing nginx cache (again) after the config change is deployed.

cc: @mensfeld this is related to your report of bundler say gem not found.

mensfeld commented 3 years ago

@sonalkr132 thank you :pray: I will keep you posted about any potential issues.

simi commented 1 year ago

This should be fixed, feel free to reopen on problems.