wet-boew / wet-boew-drupal

Drupal variant of the Web Experience Toolkit (WET)
137 stars 74 forks source link

Apache memory leak #286

Closed Pacoup closed 12 years ago

Pacoup commented 12 years ago

Having this issue where upon resizing the browser window to view the mobile site, the Apache process goes haywire and in just a few seconds consumes all the RAM available and crashes the whole server.

Working on your average LAMP:

Any idea?

sylus commented 12 years ago

Hmm not too sure about this. When you switch to the lowest media query that is where the WET javascript takes over. This is all done client side so couldn't be affecting the server... Interesting though! I will take a look at this tonight.

Pacoup commented 12 years ago

Admitedly, although it does load some additional resources like the jquery mobile CSS; I'm thinking of a mistyped resource URI or something of the sort causing an infinite loop in PHP.

Are you having the same issue on a different system? I think I'll try installing on Fedora, see what I get.

csedev commented 12 years ago

Experiencing the same problem. Crashing the server when adjusting browser size. We're on Debian.

sylus commented 12 years ago

Can confirm this as well, I can revert back four commits but ideally I'd like to know what is doing this.

sylus commented 12 years ago

Hmm looks like recent commits fixed it. Test out http://wet.openplus.ca built on latest commit everything works.

csedev commented 12 years ago

We're on the latest and greatest. Here's the error message when moving from mobile to desktop display:

Additional uncaught exception thrown while handling exception.

Original

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: SELECT source FROM {url_alias} WHERE alias = :alias AND language IN (:language, :language_none) ORDER BY language ASC, pid DESC; Array ( [:alias] => front/demo [:language] => en [:language_none] => und ) in drupal_lookup_path() (line 176 of /var/www/includes/path.inc).

Additional

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: INSERT INTO {watchdog} (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9); Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => php [:db_insert_placeholder_2] => %type: !message in %function (line %line of %file). [:db_insert_placeholder_3] => a:6:{s:5:"%type";s:12:"PDOException";s:8:"!message";s:290:"SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: SELECT source FROM {url_alias} WHERE alias = :alias AND language IN (:language, :language_none) ORDER BY language ASC, pid DESC; Array ( [:alias] => front/demo [:language] => en [:language_none] => und ) ";s:9:"%function";s:20:"drupal_lookup_path()";s:5:"%file";s:26:"/var/www/includes/path.inc";s:5:"%line";i:176;s:14:"severity_level";i:3;} [:db_insert_placeholder_4] => 3 [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => http://aandc/en [:db_insert_placeholder_7] => http://aandc/en [:db_insert_placeholder_8] => 192.168.195.1 [:db_insert_placeholder_9] => 1346096406 ) in dblog_watchdog() (line 154 of /var/www/modules/dblog/dblog.module).

csedev commented 12 years ago

Terminal shows out of memory: kill mysql process

sylus commented 12 years ago

Hmm this is going to be a tricky one to trace down. Is this only when moving from mobile to desktop and not the reverse? Something messed up is for sure happening. I still can't quite see how js can cause this to crash. Has to be a red herring.

Pacoup commented 12 years ago

csedev did you change your max_allowed_packet value to 16M in your my.cnf? Your DB error might be unrelated. I had something similar until I changed this value. Some distros like Ubuntu have it set to 16M by default.

Also, that DB errror will pop up on even just a URL change, so a simple lookup for a different resource while switching view mode in the browser might cause the error to appear.

sylus commented 12 years ago

Yeah I am not able to reproduce this error whatsoever. I will try on another server at home perhaps Amazon EC2 has too much memory or something so I never hit the limit.

Pacoup commented 12 years ago

Yeah, I just updated and on top of being unsually slow to switch, I get some weird layout issues like fuzzy text and missing images, but no more Apache crashes.

Something is definitely wrong, yet I get no errors in my apache logs.

sylus commented 12 years ago

You are logged in, this is sadly by design (till I can fix the problem), can you log out and try? Basically a CSS is added that overrides some styles that does not appear when you are logged out. As this is a new feature 2 days old still working out the kinks.

Pacoup commented 12 years ago

Getting low but constant CPU usage from apache2, then kworker, then mysqld, then kworker again (averaging 10 to 15%) and still no completely loaded mobile format (using Google Chrome); some missing icons, again, but looks a bit better than when being logged in.

My VM has 1 GB of DDR3 and a 1.7 GHz Intel Core i5. And a few minutes in, as I'm typing this, the computer's still working but to no avail.

And ah, something new, I scaled back to deskop and after a while got the same error as csedev:

Additional uncaught exception thrown while handling exception.

Original

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: SELECT ml.*, m.*, ml.weight AS link_weight FROM {menu_links} ml LEFT OUTER JOIN {menu_router} m ON m.path = ml.router_path WHERE (ml.link_path IN (:db_condition_placeholder_0)) ; Array ( [:db_condition_placeholder_0] => front/demo ) in menu_link_get_preferred() (line 2482 of /srv/www/banting/public_html/includes/menu.inc).

Additional

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: INSERT INTO {watchdog} (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9); Array ( [:db_insert_placeholder_0] => 0 [:db_insert_placeholder_1] => php [:db_insert_placeholder_2] => %type: !message in %function (line %line of %file). [:db_insert_placeholder_3] => a:6:{s:5:"%type";s:12:"PDOException";s:8:"!message";s:308:"SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: SELECT ml.*, m.*, ml.weight AS link_weight FROM {menu_links} ml LEFT OUTER JOIN {menu_router} m ON m.path = ml.router_path WHERE (ml.link_path IN (:db_condition_placeholder_0)) ; Array ( [:db_condition_placeholder_0] => front/demo ) ";s:9:"%function";s:25:"menu_link_get_preferred()";s:5:"%file";s:46:"/srv/www/banting/public_html/includes/menu.inc";s:5:"%line";i:2482;s:14:"severity_level";i:3;} [:db_insert_placeholder_4] => 3 [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => http://172.31.92.45/en [:db_insert_placeholder_7] => http://172.31.92.45/en [:db_insert_placeholder_8] => 172.31.92.61 [:db_insert_placeholder_9] => 1346097943 ) in dblog_watchdog() (line 154 of /srv/www/banting/public_html/modules/dblog/dblog.module).
csedev commented 12 years ago

Increased packet size to 128M and do not have this problem. Next question would be why are we sending 16M+ packets to MySQL when switching between mobile and desktop?

sylus commented 12 years ago

This definitely has to be fixed but still unsure about the cause. The only thing I can guess is that WET js is reloading the entire page numerous times or something similar. :(

Definitely love some assistance on this one.

sylus commented 12 years ago

Do you think you could spent some time on this @Pacoup? It would be greatly appreciated :)

eleclerc commented 12 years ago

Hi, there is two issues at stake here,

  1. upon resizing to mobile view, WET is making lots of new http request, most of them aborting, I'll try to find time to investigate this
  2. basic apache server config: apache should not be able to totally crash a server, it needs to be configured to know when to stop forking process to avoid using all the available RAM on the system. Here's the settings in one of the apache config file (name changes depending on os, might be apache2.conf, httpd.conf, server-tuning.conf, etc)

    # highest possible MaxClients setting for the lifetime of the Apache process.
    # http://httpd.apache.org/docs/2.2/mod/mpm_common.html#serverlimit
    ServerLimit        25
    # maximum number of server processes allowed to start
    # http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxclients
    MaxClients         25

One need to do some calculation to know how many MaxClient and ServerLimit should be set.

ex: my server have 2G of free RAM (4 gig minus a lot of other daemons), every hit on the drrupal-wet website needs 76M to run (read from top with data+stack displayed)... 2000M / 76M = 26.3 process before I run out of RAM. if I allow more than 26 process, apache will try to use more RAM than available on the system and will cause a kernel panic, thus killing the whole server.

this being said, number 1 definitely needs to be fixed, but number 2 is something everybody should do anyway to prevent being called during the weekend to fix a dead server :-)

LaurentGoderre commented 12 years ago

Is this only happening on resize or does it happen when you start with a small window?

csedev commented 12 years ago

On resize.

Laurent - why do we need to do HTTP requests when moving across media queries? Is there a reason to not simply use CSS?

On Tue, Aug 28, 2012 at 9:57 AM, Laurent Goderre notifications@github.comwrote:

Is this only happening on resize or does it happen when you start with a small window?

— Reply to this email directly or view it on GitHubhttps://github.com/wet-boew/wet-boew-drupal/issues/286#issuecomment-8092260.

Christopher Smith Chief Executive Officer

126 York Street, Office 300 Ottawa, Ontario, K1N 5T5

Phone: (613) 851-7102 Email: chris.smith@opin.ca Website: www.opin.ca

This e-mail message, and all attachments transmitted with it, may contain legally privileged and / or confidential information intended solely for the use of the addressee. If the reader of this message is not the intended recipient, please note that any reading, copying, dissemination, distribution or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender by email or by telephone and delete this message and all copies. Thank you.

LaurentGoderre commented 12 years ago

The two views use different CSS so that the mobile view doesn't load large view styles and vice versa. It's usually not an issue so I'm wondering why Drupal is chocking on it. Has anyone tried to reproduce it on IIS or another engine?

pjackson28 commented 12 years ago

What is happening is the page is being reloaded when moving from Desktop view to mobile view and vice-versa. The reason is that mobile view dynamically rewrites the page to suit jQuery Mobile (then AJAXes in the jQuery Mobile JS and CSS) which explains all the new http requests. It would be way more costly performance-wise (and likely impossible) to revert the changes done in mobile view which is why a fresh copy needs to be loaded. Same with desktop view where the changes made to the DOM by the plugins may not be easy to roll back and could create issues in the mobile view (not to mention the performance cost of rolling back).

Ultimately it is much easier and safer to just do a page refresh for the limited use case of resizing a browser small (or big) enough to trigger the view change.

Pacoup commented 12 years ago

Sorry @sylus, this one is up to you guys. My expertise does not lie in PHP or Drupal, and I don't think extensive HTML5 and accessibility knowledge will help fix server-side issues.

By the way, any targets for a stable version of wet-boew-drupal?

sylus commented 12 years ago

Alright knowing what WET is doing I have a few ideas on how to mitigate this problem.

Stable release is sept 17th

sylus commented 12 years ago

So a few points and requests for clarification:

1) @eleclerc point in number 2 is completely right. Apache and mysql should not be able to crash the server. I can say my server only hovers around 5% when switching to mobile mode so some config changes def need to happen. However there is still a problem with all the new http requests.

Right now I am working on the assumption that when in the desktop view a user drags to the lowest media query that the mouse is causing problems and even for a ms or two is jumping between the lowest and second lowest media query which would cause a lot of http requests on the server. Based on this have a few tests for people to try out:

a) When resizing to anything but the lowest media query (mobile) how is the server performing? b) Using a plugin (firefox or chrome) moving to the lowest set media query (dont use manual methods ie: mouse) how is the server performing? c) Using a mobile phone (preferably iPhone) and going to your server how is the server performing?

Ideally the problem lies in the manual switch to mobile or desktop mode and the fact that while invisible to humans the media query for mobile is being called a litany of times.

sylus commented 12 years ago

Also keep in mind this is not a Drupal specific problem. It is a problem with a framework that relies on a db backend.

LaurentGoderre commented 12 years ago

another option for testing the size is to set a small size with a blank page then switch between maximized and not maximized view.

LaurentGoderre commented 12 years ago

@sylus, are you sure about that? I suspect it has to do with Drupal caching since it relies on the DB. Hitting the DB for every HTTP request can easily crash the server.

sylus commented 12 years ago

Drupal can cache pretty much everything and an initial install of the distro, caching is indeed disabled until a developer turns it on (another test case). The point I am trying to make is that any framework with a db backend where aggressive caching is not used this problem will result. A fix is not to simply suggest caching be used as it only masks the problem.

LaurentGoderre commented 12 years ago

Actually I'm thinking the problem is the opposite, caching might cause this issue by filling the db call stack.

sylus commented 12 years ago

Caching is disabled on initial install to assist developers.

sylus commented 12 years ago

If you'd like @LaurentGoderre I can swing by tomorrow and we can try to double team this problem ^_^

LaurentGoderre commented 12 years ago

@sylus sure!

@Pacoup is caching still disabled on your instance?

Pacoup commented 12 years ago

God, I love it when the error is caused by a configuration mistake. I truly do:

  1. Memory leak? No. My default Apache config on Ubuntu, using MPM prefork, had a staggering setting of 150 for MaxClients. Because of every Drupal request requiring around 80 MB of RAM and the WET 3.0 making a new HTTP request for every new window size when holding the mouse and resizing down to mobile or up to desktop, Apache would quickly try to ramp up the number of processes to keep up with the requests and obviously swamp up the memory. 150 x 80 MB = 12 GB, on a 1 GB of RAM for my VM, not too functional.
  2. Because of this staggering use of RAM, the OS would try to shut down Apache processes repeatedly, causing more stress, and also causing MySQL to fail to respond, hence the "MySQL server has gone away" error when resizing.
  3. Just like @eleclerc pointed out, fixing this configuration mistake indeed fixes the problem. My page now works ok.
  4. However, when switching from desktop to mobile and vice versa, not the different views of desktop or the different views of mobile alone, the page suddenly starts making one HTTP request for every window resize event (window.onresize, or similar), which, can be quite a lot in a very short span of time, causing Apache to process every request you made until it finally gets to the last one you requested and finally shows you the page. Enabling caching for the DB doesn't help that much, not because MySQL isn't being taxed, but because all of the Apache processes are busy processing other requests.

In other words, while Drupal and Apache 2.2 are certainly not a winning combination of high performers, there's an inherent problem with the way WET 3.0 makes an HTTP request on every window resize event until the last one is fired, instead of waiting for the user to stop making resize changes to the page and then only fire on HTTP request. Locally, and perhaps on a static server, this can be acceptable, but it just won't cut it for real world dynamic scenarios, especially on weaker servers.

Yes, if you don't resize the page on a desktop, nothing bad happens, but this should be simple to fix in JavaScript with the implementation of a timer similar to this one:

(Not a tested solution, just an idea.)

var resizeTimeout;
window.onresize = function() {
    clearTimeout(resizeTimeout);
    // handle normal resize
    resizeTimeout = setTimeout(function() {
        alert("test");
    }, 250); // set for 1/4 second.  May need to be adjusted.
};

Sources: http://stackoverflow.com/questions/2996431/detect-when-a-window-is-resized-using-javascript http://stackoverflow.com/questions/6916404/window-onresize-firing-a-function-during-and-when-resizing-is-complete

Otherwise, one solution would be not to load a different stylesheet on mobile and on desktop, but this was discussed earlier as an inadequate solution due to the increased page bulk for mobile connections.

Also note that different browsers have different behaviors. For instance, Firefox fires the window.resize event less often than webkit-based browsers and thus makes the website appear to load much faster because of the reduced number of HTTP requests.

sylus commented 12 years ago

Thanks for the detailed write up @Pacoup. What do you think @pjackson28 about the suggestion for a timer?

Also in case you are interested in High Performance Drupal you might want to take into account Varnish + Memcache which when combined with Drupal truly make it fly ^_^ A great alternative to apache is also nginx but unfortunately I can't see government going behind it. They have approved Varnish + Memcache at StatCan though.

Pacoup commented 12 years ago

Huh, I thought Drupal wasn't compatible with nginx. I would have used this first had I known. As far as I know, you're much better off with nginx and php-fpm than Varnish: http://gwan.ch/benchmark (although you can use Varnish with nginx?). Anyway, I can't really pronounce myself on the subject yet; not an expert.

pjackson28 commented 12 years ago

I've instead moved the mobile check to using Resize Events (used elsewhere in WET) so that should resolve the issue. It has a default polling of 500ms.

sylus commented 12 years ago

Hey @Pacoup Yup Drupal is fully compatible with nginx. It is actually what the University of Ottawa is running Drupal with; as well as Varnish.

Also I don't think php-fpm and varnish are mutually exclusive.

If your able to use nginx at your department though I know the performance gains are pretty high as the Ottawa University folks are rather pleased with it :)

sylus commented 12 years ago

Thanks @pjackson28 I again really appreciate you taking the time :) Definitely owe u one!

Pacoup commented 12 years ago

@sylus Did you merge in the lastest changes? I just updated my wet-boew-drupal build and the page no longer loads the different resources when switching to mobile, while it does so on wet-boew.

Edit: Actually, it doesn't work on mobile either, so I'm assuming the issue was a temporary one not related to this particular update.

sylus commented 12 years ago

The changes are only brought in during the drush make phase so you will have to re run the drush make. You shouldn't need to reinstall as it only does an in place replacement.

Pacoup commented 12 years ago

Well I ran this in the directory I have my Drupal build installed in:

drush make --no-gitinfofile --working-copy https://raw.github.com/wet-boew/wet-boew-drupal/master/build-wetkit.make .

Is that not it?

Pacoup commented 12 years ago

Nevermind, my fault again, I had Aggregate JavaScript files checked, and that breaks stuff.

I'm closing this issue as the problem with excessive HTTP requests has indeed been fixed.

Thanks to everyone who worked on this.