squizlabs / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.
BSD 3-Clause "New" or "Revised" License
10.66k stars 1.48k forks source link

Memory exhausted for large JS files #1273

Closed grappler closed 7 years ago

grappler commented 7 years ago

These WordPress themes cannot be processes by PHPCS as the memory is exhausted. The cause of this is large JS files. These JS files are larger than 20000 lines. My memory limit was set to 250Mb.

I wonder if we could add a warning if a file is longer than x lines.

gsherwood commented 7 years ago

I had a think about this, and I don't think a line limit will work. It's not the number of lines that is the problem; it's the number of tokens. I could put a token limit in, but then PHPCS wouldn't process large files even if there was enough memory for it.

I think the best thing to do is to check memory during tokenizing and stop processing the file if the memory usage is too close to the limit. I'm a bit worried about the performance of all those extra checks, but I'm also worried that nobody would ever turn on an option that I provided. So maybe I'll turn this on by default in 3.x and provide an option to disable it.

gsherwood commented 7 years ago

I ended up adding this into the 3.x branch. Given the tokenizer already tries to determine if a file is minified (and thus will typically kill tokenizing) I think it makes sense to also make sure it isn't about to hit the memory limit and die for massive files. It only does this if the file has more than 1000 tokens.

What it does is try not to use more than 95% of the remaining available memory when tokenizing a file. That will leave a bit left-over for error messages (hopefully) before the memory used for storing the tokens is released. Obviously, it is more likely to hit this limit as the script progresses.

You can change the percentage using the tokenizer_memory_limit config var. This is easiest with the --runtime-set CLI option: phpcs --runtime-set tokenizer_memory_limit 90 or turn the feature off using phpcs --runtime-set tokenizer_memory_limit 0

I need to give this a bit of a run on some really big repos before deciding if this really should be on by default, what the default value should be, and if it should be there at all.

gsherwood commented 7 years ago

While this does work for the linked files, the memory limit on big repos is also being hit during the token_get_all() call, which I can't trap or predict.

After sleeping on it, I'm not entirely convinced this feature should exist at all. It would be much better for a project to exclude individual files from being checked (either on CLI or in a ruleset) so that anyone can clearly see which files are not to be checked by PHPCS.

gsherwood commented 7 years ago

I did end up reverting this change.

I'm not going to add a warning when a file appears too big because that makes a lot of assumptions about the fact that the file cannot be processed, when it certainly could be. I suggest excluding these large files from the coding standards check, or breaking them out into multiple smaller files if you really do what them checked.