Closed rlerdorf closed 11 years ago
Although if Ilia's pull-request (#42) goes in, then just returning tmpbuf->st_mtime isn't going to work. We'll need to keep the entire stat struct around.
Hi Rasmus,
I'm not sure if this optimization makes sense.
O+ checks modification time of compiled file once per "zend_optimizerplus.revalidate_freq" (2 by default) seconds. So the patch saves one stat() calls per 2 seconds. On the other hand it requires strcmp() for each included PHP script. So 20-50 additional strcmp()s for typical framework based application (Fortunately these calls will be executed once per 2 seconds too).
I tested the patch on Linux and didn't see any visible speed difference. Do you have any benchmark results?
Thanks. Dmitry.
Well, not everyone is going to be able to set their revalidate_freq to a non-zero value. Some deploy mechanisms rely on per-request stats. I suppose we could do it only if revalidate_freq=0 ?
May be it makes sense... I'll run benchmarks with revalidate_freq=0 tomorrow to see the impact of the patch.
The only way it wouldn't be faster in the revalidate_freq=0 case is if a strcmp() is more expensive than a stat() on some system. Of course, if we need the memcpy() there, it would be strcmp+memcpy but that should still be faster than a stat()
You can see the benchmark results with revalidate_freq=0 at
https://docs.google.com/spreadsheet/ccc?key=0AnUKOMH_4lgBdDF5VTB4TndLQ003TmJhdVNQSmh1cEE&usp=sharing
I would say the speed difference is comparable to measurement mistake. Probably, it makes slight speedup on "hello" and "drupal" and slight slowdown on Zend Framework based applications ("fw" and "blog").
The benchmarks were run on a Linux laptop. I would appreciate if someone run some benchmarks on Windows or other OS where stat() is more expensive.
For now, I don't see a big advantage of this patch, but it probably may be implemented in more optimal way, that won't require strcmp() for each included file. In this case it'll make sense.
This might not be a problem on a disk that's plugged physically into the system, but might make a difference on something like Amazon's EBS which are network attached devices. Or as someone on the internals list said a couple of days ago they were running via NFS.
Perhaps we correlate these two things, and perform it only if revalidate_timestamps is zero?
Sent from my tablet
On 5 במרץ 2013, at 09:11, Dmitry Stogov notifications@github.com wrote:
You can see the benchmark results with revalidate_freq=0 at
https://docs.google.com/spreadsheet/ccc?key=0AnUKOMH_4lgBdDF5VTB4TndLQ003TmJhdVNQSmh1cEE&usp=sharing
I would say the speed difference is comparable to measurement mistake. Probably, it makes slight speedup on "hello" and "drupal" and slight slowdown on Zend Framework based applications ("fw" and "blog").
The benchmarks were run on a Linux laptop. I would appreciate if someone run some benchmarks on Windows or other OS where stat() is more expensive.
For now, I don't see a big advantage of this patch, but it probably may be implemented in more optimal way, that won't require strcmp() for each included file. In this case it'll make sense.
— Reply to this email directly or view it on GitHubhttps://github.com/zend-dev/ZendOptimizerPlus/pull/40#issuecomment-14426545 .
Mostly I just don't like seeing:
stat(docroot/index.php)
stat(docroot/index.php)
in my strace output. Yes, in a large app with dozens of includes and thousands of syscalls it won't give you a measurable performance gain, but as your numbers show, it also doesn't slow it down, and there are likely environments where the stat cost is higher and the scripts in question are more minimal where saving that stat call is worthwhile.
But I also don't have really strong feelings on it.
I optimized and committed the path. It actually may work without strcmp(). Also opened_path is not always the same as path_translated (because of resolved symlinks).
I haven't looked on how the tests were performed but, can you make sure that you keep all your cores near 100% busy while you run that test. Running as many scripts as you have cores with a while (true) {}
should do just fine.
of course all the cores were busy.
Merge over an optimization from APC that saves a stat call on the top-level script by calling sapi_get_stat()