mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
387 stars 195 forks source link

Carry out performance issues analysis and hold conference call to discuss options #86

Closed TomSteinberg closed 13 years ago

sebbacon commented 13 years ago

Some notes (pasted from a team email):

OK, no more time to spend on this today; away until Thurs I'm afraid.

Here's some next steps and observations.

The backup is definitely causing a massive load on the server. This appears mainly to be caused by the enormous size of the raw_emails table (33GB), and a little bit by the post_redirects table (8GB).

A quick fix would be to exclude the raw_emails table from the standard backups, and write a custom incremental backup script to cover that.

Longer term we should probably store raw_emails on the filesystem (as once stored they are read only). We should also look at cleaning the post_redirects table out periodically, as it doesn't really need to store old data.

In addition to this, we should probably remove the whitespace-normalising regex that Alex found on info_request, and add a title+id index as per Matthew's suggestion; and we should try out clustering on incoming_message ids.

Finally, I am pretty sure that the alert_overdue_requests and track_mailer loops are culprits for hogging CPU -- which again might be mitigated by clustering (they load lots of rows from databases based on lists of ids).

Happy to look at all this from Thurs.

TomSteinberg commented 13 years ago

Wiki page of issues is here https://github.com/sebbacon/alaveteli/wiki/Performance-issues

sebbacon commented 13 years ago

The current top cacheable requests that are not being cached by Varnish are:

sebbacon commented 13 years ago

Killing the out-of-process jobs like alert-overdue-requests and foi-alert-tracks and xapian-rebuild-index result in far fewer blocking processes, and lower CPU utilisation, but CPU is still typically over 95%.

sebbacon commented 13 years ago

There is some strange behaviour first noted by Matthew. If you run 'top' you will often see a long-running process that is consuming large amounts of CPU time, e.g.

15895 foi       20   0  316m 140m 7424 R   32  1.8   1:03.00 ruby 

If you strace such a process, you often get a similar pattern:

1) When you attach the process, it's always in the middle of a very, very long read. 2) Some time after attaching, it starts doing writes as well. 3) Soon after this, the process starts stating everything in /data/vhost/www.whatdotheyknow.com/alaveteli/vendor/plugins/whatdotheyknow-theme/lib/views and reading a few bytes of them 4) Then it repeatedly stats config/general and config/general.yml and does a bit of writing to something that looks like it might be postgres 5) It stores something in memcached 6) It does some kind of checking of the current ruby process in /proc/ (? mod_ruby-related ?) 7) After a bit more database activity, it then seems to search the current Ruby path for a file called request.rb 8) It opens the Xapian database 9) It then reads in the contents of postlist.DB 10) It does some more back-and-forth with what is presumably the mod_ruby pipe 11) finally, it starts stating all the files in vendor/plugins/whatdotheyknow-theme/lib/views, but immediately segfaults

The above looks to me a bit like a Rails startup sequence. I wonder if we are watching an "application spawner process" in action as documented at http://www.modrails.com/documentation/Architectural%20overview.html and therefore wonder if we should be looking at some of the Passenger configurations, or even looking at a Mongrel cluster as a replacement.

alexjs commented 13 years ago

Thanks for breaking those down Seb. I'd point out that / is cachable if there's no session data and the correct cache headers are set. Talking to Robin this has been done by Louise on FMT, and we can theoretically re-use that approach, although I appreciate it'd require Some Dev Work (TM).

If we need a temporary fix, I put together some VCL a while ago which despite breaking the admin interface (although that's probably easily resolvable), does seem to hit most of the requirements.

# If hitting WDTK, and /not/ hitting a /profile/ URL, then ignore any Set-Cookies
# And set our own expiry
if ((req.http.host ~ "^(www.)?whatdotheyknow.com") && ( !(req.url ~ "^/profile.*") || !(req.url ~ "^/admin.*"))) {
    unset beresp.http.set-Cookie;
    set beresp.ttl = 600s;
    return (deliver);
}
sebbacon commented 13 years ago

Thanks Alex.

Thoughts about the VCL:

For logged-in users, we could simply fix by inserting a path segment in the URL, perhaps...

alexjs commented 13 years ago

why does it break the admin interface? you're checking there that the URL doesn't contain admin and the host name too...

You're right, it seems I had updated it past when I last emailed about it.

it would, however, break the home page etc when logged in ("my requests", "log out"), and the "flash" messages (e.g. when you send a request and it says "your request is on its way")

Relatively easy to avoid by expanding it to check whether the incoming Cookie header has a 'wdtk' session - but well spotted.

sebbacon commented 13 years ago

So, it turns out Louise's code (made into a patch by Matthew at https://github.com/dracos/alaveteli/commit/b029418a8d567b4d97be845196e41639094e5c46) is indeed wanted, as otherwise Rails sets a session cookie for every visitor, even if it's empty.

In addition, Rails by default sets every request to Cache-control: private (etc) unless you tell it not to. So a fix something like https://github.com/mysociety/fixmytransport/blob/master/app/controllers/application_controller.rb#L28 is also needed.

However, setting Vary: Cookie as per Louise's code won't work because of Google Analytics cookies, which will make every user get their own cache. So we will also want to strip those, using something like https://www.varnish-cache.org/docs/trunk/tutorial/cookies.html

sebbacon commented 13 years ago

Brief notes from a conversation with Francis: he says that the ruby interpreter is rubbish; and the regular expressions in particular are crappy. He believes upgrading to Ruby 1.9 could give a 30% speedup.

alexjs commented 13 years ago

Hi Seb,

However, setting Vary: Cookie as per Louise's code won't work because of Google Analytics cookies, which will make every user get their own cache. So we will also want to strip those, using something like https://www.varnish-cache.org/docs/trunk/tutorial/cookies.html

I think we've addressed this elsewhere in the VCL?

    # Remove has_js and Google Analytics cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");

Or am I misparsing?

sebbacon commented 13 years ago

Changes made to code (awaiting deployment):

Other changes:

sebbacon commented 13 years ago

Additionally:

sebbacon commented 13 years ago

Now that some initial research and steps have been taken, closing this issue in favour of new, more specific issues: issue #92, issue #93, issue #94

sebbacon commented 12 years ago

Some of Francis' old notes about performance for the record:

Reduce storing the number of bogus post redirects that aren't people

Receiving email can be resource drain starting app instance each time - use daemon instead

Regular expression library - change to faster one. Oniguruma isn't enough. This shows slowness: e = InfoRequestEvent.find(213700) text = e.incoming_message.get_main_body_text (XXX alter to call internal not cache) IncomingMessage.remove_quoted_sections(text, "")

This is slow: http://www.whatdotheyknow.com/request/renumeration_committee