zendtech / ZendOptimizerPlus

Other
914 stars 142 forks source link

Glob blacklist entries #105

Closed jamend closed 11 years ago

jamend commented 11 years ago

I see from the documentation and closed issues that it's possible to use wildcards for the blacklist file names, and wildcards in the entries in each file, but the first is using glob() and the second is using some sort of regex matching. Could the entry matching be expanded to use glob()?

My use case is that I have a bunch of sites under /home/, and each site has 2 folders, one for production (www) and one for development (dev), i.e /home/example1.com/www, /home/example1.com/dev/, /home/example2.com/www, /home/example2.com/dev/... I would like to easily blacklist this: /home/*/dev, which does not seem to work with the current wildcard matching. Perhaps there is already a way to do it, but the documentation is a bit sparse.

TerryE commented 11 years ago

If you look at the source, the blacklist function constructs a single regexp of the form

^(entry1|entry2|entry3)

which is used to match against the requested PHP files. So with the current blacklist functionality you can't do /home/_/dev or equivalent. It's a pity that you couldn't have _opcache.blacklist_regexp* as an alternative to opcache.blacklist_filename and allow admins who understand regexps to use the full regexp functionality. This would be an easy patch if the dev team were agreeable.

dstogov commented 11 years ago

I think we must have this feature. I would implement it using some special prefix (e.g. "~" character, like in Apache) for lines in blacklist file that would assume that this line as a regular expression.

Terry, I would appreciate if you could look into this.

Thanks. Dmitry.

On Sat, Jun 22, 2013 at 2:40 PM, Terry Ellison notifications@github.comwrote:

If you look at the source, the blacklist function constructs a single regexp of the form

^(entry1|entry2|entry3)

which is used to match against the requested PHP files. So with the current blacklist functionality you can't do /home/_/dev or equivalent. It's a pity that you couldn't have *opcache.blacklistregexp as an alternative to _opcache.blacklistfilename and allow admins who understand regexps to use the full regexp functionality. This would be an easy patch if the dev team were agreeable.

— Reply to this email directly or view it on GitHubhttps://github.com/zendtech/ZendOptimizerPlus/issues/105#issuecomment-19854691 .

TerryE commented 11 years ago

Dmitry, the current blacklist algo parses the blacklist file, scrubbing the entries into a safe form which can be combined into a single regexp pattern which it applies to requested filenames. Doing a hybrid is difficult because this could easily result in an invalid expression or one which runs like a dog. My suggestion is simply to have an opcache.blacklist_regexp which is applied if set. One option would be to allow this to have an INI_PER_DIR context. So that individual project hierarchies could set their own blacklist via a filter under Apache, etc.

Give me your prefs, and I'll work up a patch.

dstogov commented 11 years ago

Terry, I afraid the implementation you propose may decrease performance. First of all running two regexp (from file and from ini string) must be slower than a single one. At second, with PER_DIR regexp, we would have to compile them at request startup and also care about their recompilation, caching, memory deallocation, etc

On the other hand, I'm agree that construction a big single regexp from small (even proper) regexp, might lead to an invalid result.

May be we could use more simple glob similar syntax (?, , *) in blacklist file, that cold be translated into single regexp without problems:

? -> [^/]

TerryE commented 11 years ago

Dmitry,

Yes PER_DIR regexps might decrease performance, but there are two issues here as I see it:

My instinct would be that the sysadmins would prefer the richer functionalty, if we take a less than a mSec per request if that is the cost. I'll do a Q&D timing of alternatives and post back some indicators.

However, another thought is that this is a functional change to OPcache, and given that -- perhaps we should defer doing this to a 5.6 release as there are a bunch of other functional enhancements that we could consider in this sort of timeframe.

TerryE commented 11 years ago

The blacklist uses the std engine rather than PCRE. I did a Q&D timer to compare a typical compiled regexp vs the simpler format that you'd likely come across in a PER_DIR directive and ran both alternatives against a list of filenames each bracketed by a _rdtsc() clock counter. As usual L1/L2/L3 cache priming actually dominated the timing so its hard to talk absolutes, but on my i3 laptop, the complex / simple regexp compiles took 410K / 160K clocks (say 130 μS / 50 μS) on first exec and 21K / 9.5K clocks (7 μS / 3 μS) on repeat exec with high L1/L2 hit rates. The execution timings were about a quarter of this, again on this ~3:1 ratio.

We can cache the PER_DIR filter in the ZCG so the sort of hit that we're talking about for supporting PER_DIR blacklisting is maybe 200-300 μS per request for the compilation overhead, but a PER_DIR implementation would save maybe 10 μS per INCLUDE_OR_EVAL because any configuration using these would be using simple rather complex global blacklist regexps.

The bottom line is that this isn't going to be a performance killer. There are going to be pluses and minuses and the final impact will be in the noise. So functional considerations should decide.

dstogov commented 11 years ago

Terry,

Personally, I think the implementation is going to be too complex and I would prefer a more simple solution with "*" in blacklist. I see that PER_DIR solution also has advantages. If you are interested in your solution, you may prepare a patch that we may try in few benchmarks.

Thanks. Dmitry.

On Thu, Jun 27, 2013 at 7:50 PM, Terry Ellison notifications@github.comwrote:

The blacklist uses the std engine rather than PCRE. I did a Q&D timer to compare a typical compiled regexp vs the simpler format that you'd likely come across in a PER_DIR directive and ran both alternatives against a list of filenames each bracketed by a _rdtsc() clock counter. As usual L1/L2/L3 cache priming actually dominated the timing so its hard to talk absolutes, but on my i3 laptop, the complex / simple regexp compiles took 410K / 160K clocks (say 130 μS / 50 μS) on first exec and 21K / 9.5K clocks (7 μS / 3 μS) on repeat exec with high L1/L2 hit rates. The execution timings were about a quarter of this, again on this ~3:1 ratio.

We can cache the PER_DIR filter in the ZCG so the sort of hit that we're talking about for supporting PER_DIR blacklisting is maybe 200-300 μS per request for the compilation overhead, but a PER_DIR implementation would save maybe 10 μS per INCLUDE_OR_EVAL because any configuration using these would be using simple rather complex global blacklist regexps.

The bottom line is that this isn't going to be a performance killer. There are going to be pluses and minuses and the final impact will be in the noise. So functional considerations should decide.

— Reply to this email directly or view it on GitHubhttps://github.com/zendtech/ZendOptimizerPlus/issues/105#issuecomment-20131663 .

TerryE commented 11 years ago

OK, I will follow your advice. I agree that allowing an embedded * is simpler in terms of change impact, and this could be a dot release patch that you could review and have confidence in. The PER_DIR option needs proper consideration and maybe needs deferring to a 5.6 release. I'll work up the "embedded *" now, as there's other changes that I think merit considering / introducing and weighed against this one in the 5.6 timeframe. :-)

PS could you edit your "May be we could use more simple glob" comment above to add the extra \ chars before the * chars so I can see what you want.

dstogov commented 11 years ago

Thanks for understanding and sorry for slow response. I would appreciate your work on this patch.
I don't see any problems enabling it in php-5.5.1 (support for * and ** in blacklist file).

TerryE commented 11 years ago

@dstogov, the current README documentation for opcache.blacklist_filename states:

The filename may be a full path or just a file prefix (i.e., /var/www/x blacklists all the files and directories in /var/www that start with x)

Reviewing / walking through the code I noticed that this isn't quite the case as the blacklist function does an expand_filepath() on each entry so a simple x will match $CWD/x.*, etc., where the current working directory is whatever was established during SAPI startup, and this varies from SAPI to SAPI, so the path relative entries will be very confusing. IMO, if we don't want to do a b/c break then we should expand the README to be explicit on how the entries are resolved or better if we allow path relative entries then they should be relative to the blacklist file directory. Thoughts?

dstogov commented 11 years ago

I think it's better to disable relative paths.

On Tue, Jul 2, 2013 at 6:14 PM, Terry Ellison notifications@github.comwrote:

@dstogov https://github.com/dstogov, the current README documentation for opcache.blacklist_filename states:

The filename may be a full path or just a file prefix (i.e., /var/www/xblacklists all the files and directories in /var/www that start with x)

Reviewing / walking through the code I noticed that this isn't quite the case as the blacklist function does an expand_filepath() on each entry so a simple x will match $CWD/x.*, etc., where the current working directory is whatever was established during SAPI startup, and this varies from SAPI to SAPI, so the path relative entries will be very confusing. IMO, if we don't want to do a b/c break then we should expand the README to be explicit on how the entries are resolved or better if we allow path relative entries then they should be relative to the blacklist file directory. Thoughts?

— Reply to this email directly or view it on GitHubhttps://github.com/zendtech/ZendOptimizerPlus/issues/105#issuecomment-20347797 .

TerryE commented 11 years ago

I've coded this up and tested main path. Need to cover error paths and figure out best zend API call to disable relative paths for winXX. Am doing chores for the kids this w/e so will post patch back / do pull req on Monday.

dstogov commented 11 years ago

It would be great! Thanks.

TerryE commented 11 years ago

Dmitry, I've raised a pull request for these changes. I am not used to using Github rather than svn for this mode of collab so am on a learning curve. Sorry if I've got this wrong. Give me advice / feedback if so. A couple of tweaks in this:

dstogov commented 11 years ago

Hi Terry,

I'm going to be on vacation on next two weeks. I'll try to review your patch tomorrow, but I also have a lot of unfinished work :(

In case, I won't be able to review it, I'll ask Xinchen (laruence) to take a look.

Thanks. Dmitry.

On Thu, Jul 11, 2013 at 8:40 PM, Terry Ellison notifications@github.comwrote:

Dmitry, I've raised a pull request for these changes. I am not used to using Github rather than svn for this mode of collab so am on a learning curve. Sorry if I've got this wrong. Give me advice / feedback if so. A couple of tweaks in this:

  • the * char has to map onto [^]* on Win32 systems and not [^/]*
  • the error handling around relative paths in the blacklist was a mess so it was just easier to use the path of the blacklist file as the "relative from" default.

— Reply to this email directly or view it on GitHubhttps://github.com/zendtech/ZendOptimizerPlus/issues/105#issuecomment-20825115 .

TerryE commented 11 years ago

Dmitry, it isn't that important. It can wait. You'll have enough last minute jobs to do. Enjoy your holiday :-)

dstogov commented 11 years ago

Implemented support for ?, * and \ symbols in blacklist entries.