igrigorik / vimgolf

Real Vim ninjas count every keystroke - do you?
http://www.vimgolf.com/
MIT License
672 stars 65 forks source link

Include position in ranking for each challenge in user page #293

Closed filbranden closed 4 years ago

filbranden commented 4 years ago

Additionally to showing the best player score and the best score for the challenge, include the actual position of the best user score in the ranking for that particular challenge.

Since the query including the rankings is potentially too expensive, only perform the full query and include the rankings if ?ranking=1 is included in the URL, that way just browsing user pages normally will not suffer from the performance hit of the expensive query.

Benchmarking:

I created db:setup with challenges=300 users=10000, then I patched db/seeds.db to have most challenges have a random number of entries in the range 100..2,000, but 2% of them in the range 5,000..40,000, to provide a more realist test scenario. (Creating all challenges with 20,000 entries just made all queries too expensive from the start.)

With that setup, I measured the effects of loading a user page, for a user with entries in 36 challenges, which included all 6 challenges with over 5,000 entries. More specifically, the 6 top challenges ranged from 17,829 to 37,251 entries.

I then loaded this page 10 times, with and without the changes in this PR. I took the average and standard deviation of the 10 measurements for each situation.

This is a roughly 9x increase, which is perhaps quite expensive to make it acceptable for the general case.

Hettomei commented 4 years ago

Hi.

Thanks for the PR, it s a really nice feature.

I only know mongodb's basics, I have no clue what it does in details to extract the data :) but I m concerned about performance : have you tested your request with the same amount of data as http://www.vimgolf.com/challenges/55b18bbea9c2c30d04000001 ?

Using rake db:seed + some params (you need to read the file), you can create such amount of data.

filbranden commented 4 years ago

Yes, performance is certainly a concern.

I think it's also not so much about a challenge with too many entries, but with a user that is in too many challenges which have too many entries...

Yes I guess I could try to create a fairly large test mongodb to take a look at that, but perhaps it would be best to test it on the real data? Any chance you could do that yourself? Or maybe could you give me access to the actual database so I can test that myself?

(Also, an option is to just push this to production and if you notice it's too slow you can rollback the commit and push it to production again...)

filbranden commented 4 years ago

I have now benchmarked the change from this PR.

I created db:setup with challenges=300 users=10000, then I patched db/seeds.db to have most challenges have a random number of entries in the range 100..2,000, but 2% of them in the range 5,000..40,000, to provide a more realist test scenario. (Creating all challenges with 20,000 entries just made all queries too expensive from the start.)

With that setup, I measured the effects of loading a user page, for a user with entries in 36 challenges, which included all 6 challenges with over 5,000 entries. More specifically, the 6 top challenges ranged from 17,829 to 37,251 entries.

I then loaded this page 10 times, with and without the changes in this PR. I took the average and standard deviation of the 10 measurements for each situation.

This is a roughly 9x increase, which is perhaps quite expensive to make it acceptable, even though the feature itself is very nice...

I guess I'm OK with having this PR closed if you think this is untenable, but I thought I'd report back on the numbers here and have a discussion on whether there's anything we can do here to make this acceptable (for example, caching user pages? or perhaps storing the ranking in the database itself?)

Cheers! Filipe

filbranden commented 4 years ago

If this query is indeed too expensive... Perhaps we could create a separate page /:user/rankings to display all the user's rankings for all challenges?

I know I would love to be able to see all this information together. (Curious about how many #1s I have...) 😁

Hettomei commented 4 years ago

After rebasing master, It works. And it s great.

My low quality benchmark: It take 7sec to load my fake profile (with an average computer) "entered into 51 challenges, 20 000 entry per challenge, 100 user per challenge" And 2 sec on master branch.

I have absolutely no idea if heroku will support this, I don't know the metrics.

@igrigorik I m also curious to know if you accept the proposition of @filbranden :

(Also, an option is to just push this to production and if you notice it's too slow you can rollback the commit and push it to production again...)

igrigorik commented 4 years ago

9x regression would be a major hit, unfortunately I don't think that's something we can land as it'll easily turn into a DDoS vector for the site -- same problem as we had before, and I'd love not to reintroduce it.

If we want to have this functionality, we should probably explore some batch and async computation -- e.g. periodic cron job that populates a static rank table, etc. Unless, someone else has other magic to fix the 9x regression.

filbranden commented 4 years ago

Alright, so I updated this one again, to try to make the feature available while not causing an unacceptable hit on the user pages.

I did so by only displaying the ranking for each challenge and performing the expensive query if ?ranking=1 is passed to the user page URL.

(This feels somewhat like an "Easter Egg" in a way, which is not ideal... But I guess at least it allows us to push this feature to production and evaluate how it performs there, before making a decision of whether to export it more widely or even make the behavior the default.)

Let me know if you think this is acceptable for a first shot at exposing this information!

Cheers, Filipe

filbranden commented 4 years ago

@igrigorik @Hettomei Ping?

I'd love to see this merged and deployed to production if possible! :-D

Cheers, Filipe

igrigorik commented 4 years ago

@filbranden thanks for the nudge. Trying to grok the two different modes you're describing.. Do you have some screenshots that you can share? That might be the fastest way. :)

filbranden commented 4 years ago

So this is what the user page normally looks like:

image

It still looks the same, by default, with this PR merged.

But with this code merged in, you can add a ?ranking=1 to the URL and get this instead:

image

Let me highlight the differences:

image

The page with ?ranking=1 has a more expensive query... But that query will only be executed with the additional query parameter, which is not really exposed anywhere... So right now this is pushing a "hidden feature" of sorts, an easter egg... But this will allow us to have a good idea of how expensive this is on the production site and database, to decide whether including this information everywhere, by default, would be feasible.

igrigorik commented 4 years ago

Thanks, that's really helpful. What I was trying to wrap my head around is if/how it modifies existing pages shown by default, but based on above I think you clarified that.. it doesn't.

I'm not opposed to shipping this as a hidden feature, but the problem is that if it does become known to folks, and they start using it, it will almost definitely affect the overall operation of the site. I guess that's a risk we could take and then pull the plug... but, that'll be painful. Tough one!

filbranden commented 4 years ago

Part of this one too is that you don't execute the expensive query every time you land on a user page, or when you navigate in and out of the same user page and generate it somewhat often...

Even if users find out about the "magic" query, I wouldn't expect many of them will keep checking it repeatedly (after all, it's not like you're ranking is going to go up or down that quickly...) I'd be happy if I could do it for my user once to take a snapshot :smile:

Regarding performance considerations and possible optimization, I still think #298 (migrate to pgsql) is the way to go. I'm still interested in working on that, since I'm somewhat new to RoR and this is a great project to practice (while at the same time being one I really care about.)

igrigorik commented 4 years ago

@filbranden ok, let's land it -- worst case, we'll rollback. Thanks for your patience and the great work here.

filbranden commented 3 years ago

I tested the changes from here and I see they seem to work for users with very few (like 2) challenges: http://www.vimgolf.com/albert07971271?ranking=1

But they seem to fail (rather quickly) for users with many challenges:

We're sorry, but something went wrong.
We've been notified about this issue and we'll take a look at it shortly.

I'm guessing it's a Heroku limit on how long it can take to build the page... So I guess that one is not as easy to solve as I expected... Oh well.

Hettomei commented 3 years ago

Here is the log of failing page http://www.vimgolf.com/zmf_tim?ranking=1

error :

2020-11-21T21:10:38.097064+00:00 app[web.1]: ActionView::Template::Error (Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. (16819)):

full log :

2020-11-21T21:10:36.833488+00:00 app[web.1]: source=rack-timeout id=54fb9af5-fd92-4dbc-876c-cdc147b5789d wait=2ms timeout=15000ms state=ready
2020-11-21T21:10:36.834847+00:00 app[web.1]: Started GET "/" for 162.158.106.150 at 2020-11-21 21:10:36 +0000
2020-11-21T21:10:36.836495+00:00 app[web.1]: I, [2020-11-21T21:10:36.836416 #9]  INFO -- : Processing by MainController#index as */*
2020-11-21T21:10:37.526265+00:00 app[web.1]: Rendering main/index.erb within layouts/application
2020-11-21T21:10:37.623645+00:00 app[web.1]: source=rack-timeout id=ded903ed-63f9-4f73-9bf9-e7c5c1c835a8 wait=3ms timeout=15000ms state=ready
2020-11-21T21:10:37.624687+00:00 app[web.1]: Started GET "/zmf_tim?ranking=1" for 108.162.229.195 at 2020-11-21 21:10:37 +0000
2020-11-21T21:10:37.626382+00:00 app[web.1]: I, [2020-11-21T21:10:37.626274 #12]  INFO -- : Processing by UsersController#show as HTML
2020-11-21T21:10:37.626449+00:00 app[web.1]: I, [2020-11-21T21:10:37.626374 #12]  INFO -- :   Parameters: {"ranking"=>"1", "username"=>"zmf_tim"}
2020-11-21T21:10:37.676335+00:00 app[web.1]: Rendering users/show.erb within layouts/application
2020-11-21T21:10:38.204245+00:00 heroku[router]: at=info method=GET path="/" host=www.vimgolf.com request_id=54fb9af5-fd92-4dbc-876c-cdc147b5789d fwd="54.244.190.135,162.158.106.150" dyno=web.1 connect=0ms service=1371ms status=200 bytes=17983 protocol=http
2020-11-21T21:10:38.100519+00:00 heroku[router]: at=info method=GET path="/zmf_tim?ranking=1" host=www.vimgolf.com request_id=ded903ed-63f9-4f73-9bf9-e7c5c1c835a8 fwd="2a01:e0a:52a:15a0:6eb2:9b7f:d6b:fdb3,108.162.229.195" dyno=web.1 connect=1ms service=478ms status=500 bytes=956 protocol=http
2020-11-21T21:10:38.094303+00:00 app[web.1]: Rendered users/show.erb within layouts/application (417.8ms)
2020-11-21T21:10:38.094564+00:00 app[web.1]: I, [2020-11-21T21:10:38.094463 #12]  INFO -- : Completed 500 Internal Server Error in 468ms
2020-11-21T21:10:38.097006+00:00 app[web.1]:
2020-11-21T21:10:38.097064+00:00 app[web.1]: ActionView::Template::Error (Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. (16819)):
2020-11-21T21:10:38.097315+00:00 app[web.1]: 1: <div class="grid_7">
2020-11-21T21:10:38.097315+00:00 app[web.1]: 2:
2020-11-21T21:10:38.097316+00:00 app[web.1]: 3:   <h3><b>Played Challenges</b></h3>
2020-11-21T21:10:38.097317+00:00 app[web.1]: 4:     <% @show_profile.tried_challenges.each do |player_challenge| %>
2020-11-21T21:10:38.097317+00:00 app[web.1]: 5:         <div>
2020-11-21T21:10:38.097318+00:00 app[web.1]: 6:             <%= render partial: "shared/challenge", locals: { challenge: player_challenge } %>
2020-11-21T21:10:38.097319+00:00 app[web.1]: 7:             <ul>
2020-11-21T21:10:38.097368+00:00 app[web.1]:
2020-11-21T21:10:38.097417+00:00 app[web.1]: app/services/show_profile.rb:26:in `to_a'
2020-11-21T21:10:38.097418+00:00 app[web.1]: app/services/show_profile.rb:26:in `tried_challenges'
2020-11-21T21:10:38.097419+00:00 app[web.1]: app/views/users/show.erb:4:in `_app_views_users_show_erb__3095988784862914793_46222660'
2020-11-21T21:10:38.097772+00:00 app[web.1]: source=rack-timeout id=ded903ed-63f9-4f73-9bf9-e7c5c1c835a8 wait=3ms timeout=15000ms service=474ms state=completed
2020-11-21T21:10:38.182385+00:00 app[web.1]: Rendered collection of shared/_challenge.erb [50 times] (12.8ms)
2020-11-21T21:10:38.197337+00:00 app[web.1]: Rendered shared/_tweets.erb (0.1ms)
2020-11-21T21:10:38.197553+00:00 app[web.1]: Rendered main/index.erb within layouts/application (671.2ms)
2020-11-21T21:10:38.199618+00:00 app[web.1]: I, [2020-11-21T21:10:38.199502 #9]  INFO -- : Completed 200 OK in 1363ms (Views: 673.6ms)
2020-11-21T21:10:38.200610+00:00 app[web.1]: source=rack-timeout id=54fb9af5-fd92-4dbc-876c-cdc147b5789d wait=2ms timeout=15000ms service=1367ms state=completed
filbranden commented 3 years ago

@Hettomei about the Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. (16819) MongoDB error, I pushed PR #297 a while ago (it was merged back in August) that I expected would have resolved this issue...

You mentioned in the other thread that there might be a difference in which commits are on GitHub and on Heroku (more specifically, you mentioned a commit that's only on Heroku), but is it possible that the change from PR #297 is not present on the Heroku codebase?

(One aside: it would be nice to synchronize the two so they're identical, and perhaps look into some automation to get all merged PRs to flow to Heroku automatically, I wonder if there's some GitHub Actions plug-ins to do so, or maybe even Heroku supports some kind of automatic pulls on their side too?)

Hettomei commented 3 years ago

Hi

You mentioned in the other thread that there might be a difference in which commits are on GitHub and on Heroku

I merged master into heroku. So everything present here is present on the server. (And I double checked :) )

One aside: it would be nice to synchronize the two so they're identical

Agree, Theire was only one commit, oneline related to avatar picture. But I let igrigorik to manage his baby :)