zimmeee / fortunatepun

I aint no Fortunate Pun
0 stars 0 forks source link

Profile page load and improve speed #9

Open zimmeee opened 10 years ago

zimmeee commented 10 years ago

Fixed by 02fa2b5621cdf35515777f661f2aae0528bc4024 with 85x speedup

zimmeee commented 10 years ago

Well, that was short lived. I don't know how to debug this stuff. I threw some indices on the tables in a last ditch effort, but it didn't help. Will need @rbruggner for this one. You can find the SQL in SQL/FeedGenerator.SQL

zimmeee commented 10 years ago

Tried upgrading the instance type from D0 (128M RAM) to D1 (512M RAM) which reduced query time on my page from ~15s to ~5s. I still dont have any idea how to properly debug.

wasauce commented 10 years ago

We could do caching and keep a warm cache. This could allow us to show a recent set of results immediately and then do an auto-refresh on the page after a few seconds to get an update. AJAX here we come!

I suspect it is two things:

  1. The query is probably not optimized — if I remember there are some JOINs
  2. The app is actually cold and has to be spun up every time we hit it — one solution to this is…. more users and hence more traffic. Google puts apps to rest when they are not sustaining active use

--  Bill Ferrell 650.646.2065 From: Noah Zimmerman notifications@github.com Reply: zimmeee/fortunatepun reply@reply.github.com Date: February 14, 2014 at 12:17:01 AM To: zimmeee/fortunatepun fortunatepun@noreply.github.com Subject:  Re: [fortunatepun] Profile page load and improve speed (#9)
Tried upgrading the instance type from D0 (128M RAM) to D1 (512M RAM) which reduced query time on my page from ~15s to ~5s. I still dont have any idea how to properly debug.

— Reply to this email directly or view it on GitHub.

zimmeee commented 10 years ago

AJAX - Bill I love it when you talk dirty.

The query itself is definitely slow. The higher-memory instance definitely made a noticeable difference. Before we get into AJAX and query optimization, I think Bill is right about a release. How should we do it? Strippers? Viral video campaign?

On Feb 14, 2014, at 1:11 AM, wasauce notifications@github.com wrote:

We could do caching and keep a warm cache. This could allow us to show a recent set of results immediately and then do an auto-refresh on the page after a few seconds to get an update. AJAX here we come!

I suspect it is two things:

  1. The query is probably not optimized — if I remember there are some JOINs
  2. The app is actually cold and has to be spun up every time we hit it — one solution to this is…. more users and hence more traffic. Google puts apps to rest when they are not sustaining active use

Bill Ferrell 650.646.2065 From: Noah Zimmerman notifications@github.com Reply: zimmeee/fortunatepun reply@reply.github.com Date: February 14, 2014 at 12:17:01 AM To: zimmeee/fortunatepun fortunatepun@noreply.github.com Subject: Re: [fortunatepun] Profile page load and improve speed (#9) Tried upgrading the instance type from D0 (128M RAM) to D1 (512M RAM) which reduced query time on my page from ~15s to ~5s. I still dont have any idea how to properly debug.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

rbruggner commented 10 years ago

Breaking down the query, the retrieval of records is actually quite quick - I think it’s the aggregation statistics and concatenation that is slow. For instance, this is quite quick:

SELECT distinct URLer.urlid, URL.expanded_url, count(URLer.urlid) as votes from URLer, tokens, URL WHERE tokens.twitter_id=URLer.twitter_id AND tokens.twitter_handle='zimmeee' AND URLer.tweet_time >= now() - INTERVAL 1 DAY AND URL.urlid=URLer.urlid GROUP BY URLer.urlid ORDER BY count(URLer.urlid) DESC

Noah, I think you’ve been previously spoiled by databases that are optimized for compute as well as query :) Unfortunately I don’t think the app engine instances are good at that (or at least, not the free tier). However, can python simply to the aggregation statistics quickly?

For instance, running this query is quick:

SELECT URL.expanded_url, URLer.twitter_handle, URLer.tweetid, URLer.twitter_id from URLer, tokens, URL WHERE tokens.twitter_id=URLer.twitter_id AND tokens.twitter_handle='zimmeee' AND URLer.tweet_time >= now() - INTERVAL 1 DAY AND URL.urlid=URLer.urlid AND URLer.twitter_id=tokens.twitter_id AND URL.expanded_url IS NOT NULL ORDER BY URL.expanded_url

And that gets you all the URLs and the twitter handles that tweets them. I assume python can quickly tabulate the URL occurrences (or am I still in R land)? It’s not as pretty as your query, but it’s less CPU-intensive for the database and I assume that python will be quicker in aggregating the results?

Actually, that last query just revealed a bug in the URL aggregation code. Der…

On Feb 14, 2014, at 8:51 AM, Noah Zimmerman notifications@github.com wrote:

AJAX - Bill I love it when you talk dirty.

The query itself is definitely slow. The higher-memory instance definitely made a noticeable difference. Before we get into AJAX and query optimization, I think Bill is right about a release. How should we do it? Strippers? Viral video campaign?

On Feb 14, 2014, at 1:11 AM, wasauce notifications@github.com wrote:

We could do caching and keep a warm cache. This could allow us to show a recent set of results immediately and then do an auto-refresh on the page after a few seconds to get an update. AJAX here we come!

I suspect it is two things:

  1. The query is probably not optimized — if I remember there are some JOINs
  2. The app is actually cold and has to be spun up every time we hit it — one solution to this is…. more users and hence more traffic. Google puts apps to rest when they are not sustaining active use

Bill Ferrell 650.646.2065 From: Noah Zimmerman notifications@github.com Reply: zimmeee/fortunatepun reply@reply.github.com Date: February 14, 2014 at 12:17:01 AM To: zimmeee/fortunatepun fortunatepun@noreply.github.com Subject: Re: [fortunatepun] Profile page load and improve speed (#9) Tried upgrading the instance type from D0 (128M RAM) to D1 (512M RAM) which reduced query time on my page from ~15s to ~5s. I still dont have any idea how to properly debug.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

zimmeee commented 10 years ago

sweet baby jesus. you're the man, bruggner.

it'll actually be way easier to unpack that stuff if i DON'T do the aggregations in SQL. still trying to shake my greenplum massively parallelized database habits.

On Feb 14, 2014, at 9:41 AM, Robert Bruggner notifications@github.com wrote:

Breaking down the query, the retrieval of records is actually quite quick - I think it’s the aggregation statistics and concatenation that is slow. For instance, this is quite quick:

SELECT distinct URLer.urlid, URL.expanded_url, count(URLer.urlid) as votes from URLer, tokens, URL WHERE tokens.twitter_id=URLer.twitter_id AND tokens.twitter_handle='zimmeee' AND URLer.tweet_time >= now() - INTERVAL 1 DAY AND URL.urlid=URLer.urlid GROUP BY URLer.urlid ORDER BY count(URLer.urlid) DESC

Noah, I think you’ve been previously spoiled by databases that are optimized for compute as well as query :) Unfortunately I don’t think the app engine instances are good at that (or at least, not the free tier). However, can python simply to the aggregation statistics quickly?

For instance, running this query is quick:

SELECT URL.expanded_url, URLer.twitter_handle, URLer.tweetid, URLer.twitter_id from URLer, tokens, URL WHERE tokens.twitter_id=URLer.twitter_id AND tokens.twitter_handle='zimmeee' AND URLer.tweet_time >= now() - INTERVAL 1 DAY AND URL.urlid=URLer.urlid AND URLer.twitter_id=tokens.twitter_id AND URL.expanded_url IS NOT NULL ORDER BY URL.expanded_url

And that gets you all the URLs and the twitter handles that tweets them. I assume python can quickly tabulate the URL occurrences (or am I still in R land)? It’s not as pretty as your query, but it’s less CPU-intensive for the database and I assume that python will be quicker in aggregating the results?

Actually, that last query just revealed a bug in the URL aggregation code. Der…

On Feb 14, 2014, at 8:51 AM, Noah Zimmerman notifications@github.com wrote:

AJAX - Bill I love it when you talk dirty.

The query itself is definitely slow. The higher-memory instance definitely made a noticeable difference. Before we get into AJAX and query optimization, I think Bill is right about a release. How should we do it? Strippers? Viral video campaign?

On Feb 14, 2014, at 1:11 AM, wasauce notifications@github.com wrote:

We could do caching and keep a warm cache. This could allow us to show a recent set of results immediately and then do an auto-refresh on the page after a few seconds to get an update. AJAX here we come!

I suspect it is two things:

  1. The query is probably not optimized — if I remember there are some JOINs
  2. The app is actually cold and has to be spun up every time we hit it — one solution to this is…. more users and hence more traffic. Google puts apps to rest when they are not sustaining active use

Bill Ferrell 650.646.2065 From: Noah Zimmerman notifications@github.com Reply: zimmeee/fortunatepun reply@reply.github.com Date: February 14, 2014 at 12:17:01 AM To: zimmeee/fortunatepun fortunatepun@noreply.github.com Subject: Re: [fortunatepun] Profile page load and improve speed (#9) Tried upgrading the instance type from D0 (128M RAM) to D1 (512M RAM) which reduced query time on my page from ~15s to ~5s. I still dont have any idea how to properly debug.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.