inclusive-design / AChecker

Automated interactive Web content accessibility checker.
https://achecker.ca
GNU General Public License v2.0
69 stars 61 forks source link

Insufficient memory while ckecking HTML paste with too much dom inside #42

Open tassoman opened 10 years ago

tassoman commented 10 years ago

Ciao a tutti :) I've found a problem, if you load too much code into HTML textarea field PHP will run out of memory (128M).

I suppose the problem is into simple_html_dom library but I can't debug at which level. Reading library's documentation emerge that could be related to a known problem with a PHP5 memory leak: http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak

You can reproduce the bug acting like this:

  1. Plain install aChecker
  2. Do an URL validation of an huge site, for example www.trenitalia.it
  3. copy HTML source code of validation from browser
  4. paste source code into aChecker HTML textarea
  5. try to validate
( ! ) Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 226101 bytes) in /var/www/include/lib/simple_html_dom.php on line 614
Call Stack
# Time Memory Function Location
1 0.0129 1351312 {main}( ) ../index.php:0
2 0.1782 4723836 AccessibilityValidator->validate( ) ../index.php:258
3 0.1782 4723836 AccessibilityValidator->get_simple_html_dom( ) ../AccessibilityValidator.class.php:80
4 0.1782 4723836 str_get_dom( ) ../AccessibilityValidator.class.php:143
5 0.1782 4725644 simple_html_dom->load( ) ../simple_html_dom.php:43
6 7.6116 29384796 simple_html_dom->parse( ) ../simple_html_dom.php:464
7 7.6116 29384868 simple_html_dom->read_tag( ) ../simple_html_dom.php:550
8 7.6117 29386560 substr ( ) ../simple_html_dom.php:614
cindyli commented 10 years ago

Thanks for the report and research on this issue. This is also a known issue @ http://atutor.ca/atutor/mantis/view.php?id=4792

Your finding with the PHP5 memory leak would be a good start point for a solution. Would you mind to give a try? :)

tassoman commented 10 years ago

I'm working on it, of course. I've tried to call clear() method of simple html dom in AccessibilityValidator.class.php file (at line 143 and 80) but with no improvements.

Basically they ask to call clear() method before instancing the object more than one time but you see It's very difficult for me understand how many times object is instanced.

I've tried to add a debugging var_dump() at final error line number 614 and i got many and many outputs in the page. So I bet it's called each time a check is run.

As first author of code, Could you @cindyli provide more help?

tassoman commented 10 years ago

If you look into achecker/index.php file there's also a memory check done using Utility object that always will pass also if you pass the $_POST['htmlpaste'] variable as argument. But the application fails before, by dom library error called before while instancing AccessibilityValidator object and running validate() method on it.

cindyli commented 10 years ago

Thanks for trying it out, @tassoman. Off the top of my head, one thing can try in AChecker is in this chunk of code: https://github.com/atutor/AChecker/blob/master/include/classes/AccessibilityValidator.class.php#L77-L93, add a clean method as the last step.

However, as you said, the validation might never get to the last step since the validation might have been interrupted by the error thrown out of the simple_html_dom. This makes me think where the memory leak occurs? If it's from the simple html dom library, shall we patch the library rather than AChecker?

tassoman commented 10 years ago

If you var_dump($portion) after row 614, just inside your customization, you could see the $portion growing and being unset at R#617 but any clear() method is run. I've tried to run $this->clear(); in it but it mess all things. Time is up for me today. :)

tassoman commented 10 years ago

Hello @cindyli , after a good night slept well, I've got the solution, I've edited the validate() method to let it return boolean instead of void.

So we can control arbitrarily if a validation is gone well or for good... :grin: In this method we measure the size of $this->validate_content. If it's greater than a custom amount we return a false feedback to the checker/index.php rather than true.

Finally we manage the "exception" triggering a $msg->addError('NO_ENOUGH_MEMORY') method. That's all and works.

This modification involves also URL submission verifications (user could submit arbitrary URI with an unknown amount of size data, also a stream in the worst case), thus for uploaded content.

I've not yet submitted any pull request because of "core" modification is involved. But If you still interested I could do it in the next future.

cindyli commented 10 years ago

Thanks for digging further, @tassoman.

You mentioned "If it's greater than a custom amount...". Can you elaborate what "a custom amount" is? User defined or system calculated? What is it based on to be defined or calculated?

I'm very interested to see your implementation. It would be great if you could submit a pull request when you have a chance. Thanks.

tassoman commented 10 years ago

I was wondering about a constant defined into constants.php file overridable by users editing config.inc.php Actually I'm stick on a 200*1024 size that's is 200KB strlen