zendtech / ZendOptimizerPlus

Other
915 stars 142 forks source link

Save a stat call by calling sapi_get_stat() #40

Closed rlerdorf closed 11 years ago

rlerdorf commented 11 years ago

Merge over an optimization from APC that saves a stat call on the top-level script by calling sapi_get_stat()

rlerdorf commented 11 years ago

Although if Ilia's pull-request (#42) goes in, then just returning tmpbuf->st_mtime isn't going to work. We'll need to keep the entire stat struct around.

dstogov commented 11 years ago

Hi Rasmus,

I'm not sure if this optimization makes sense.

O+ checks modification time of compiled file once per "zend_optimizerplus.revalidate_freq" (2 by default) seconds. So the patch saves one stat() calls per 2 seconds. On the other hand it requires strcmp() for each included PHP script. So 20-50 additional strcmp()s for typical framework based application (Fortunately these calls will be executed once per 2 seconds too).

I tested the patch on Linux and didn't see any visible speed difference. Do you have any benchmark results?

Thanks. Dmitry.

rlerdorf commented 11 years ago

Well, not everyone is going to be able to set their revalidate_freq to a non-zero value. Some deploy mechanisms rely on per-request stats. I suppose we could do it only if revalidate_freq=0 ?

dstogov commented 11 years ago

May be it makes sense... I'll run benchmarks with revalidate_freq=0 tomorrow to see the impact of the patch.

rlerdorf commented 11 years ago

The only way it wouldn't be faster in the revalidate_freq=0 case is if a strcmp() is more expensive than a stat() on some system. Of course, if we need the memcpy() there, it would be strcmp+memcpy but that should still be faster than a stat()

dstogov commented 11 years ago

You can see the benchmark results with revalidate_freq=0 at

https://docs.google.com/spreadsheet/ccc?key=0AnUKOMH_4lgBdDF5VTB4TndLQ003TmJhdVNQSmh1cEE&usp=sharing

I would say the speed difference is comparable to measurement mistake. Probably, it makes slight speedup on "hello" and "drupal" and slight slowdown on Zend Framework based applications ("fw" and "blog").

The benchmarks were run on a Linux laptop. I would appreciate if someone run some benchmarks on Windows or other OS where stat() is more expensive.

For now, I don't see a big advantage of this patch, but it probably may be implemented in more optimal way, that won't require strcmp() for each included file. In this case it'll make sense.

mvrhov commented 11 years ago

This might not be a problem on a disk that's plugged physically into the system, but might make a difference on something like Amazon's EBS which are network attached devices. Or as someone on the internals list said a couple of days ago they were running via NFS.

zendtech commented 11 years ago

Perhaps we correlate these two things, and perform it only if revalidate_timestamps is zero?

Sent from my tablet

On 5 במרץ 2013, at 09:11, Dmitry Stogov notifications@github.com wrote:

You can see the benchmark results with revalidate_freq=0 at

https://docs.google.com/spreadsheet/ccc?key=0AnUKOMH_4lgBdDF5VTB4TndLQ003TmJhdVNQSmh1cEE&usp=sharing

I would say the speed difference is comparable to measurement mistake. Probably, it makes slight speedup on "hello" and "drupal" and slight slowdown on Zend Framework based applications ("fw" and "blog").

The benchmarks were run on a Linux laptop. I would appreciate if someone run some benchmarks on Windows or other OS where stat() is more expensive.

For now, I don't see a big advantage of this patch, but it probably may be implemented in more optimal way, that won't require strcmp() for each included file. In this case it'll make sense.

— Reply to this email directly or view it on GitHubhttps://github.com/zend-dev/ZendOptimizerPlus/pull/40#issuecomment-14426545 .

rlerdorf commented 11 years ago

Mostly I just don't like seeing:

stat(docroot/index.php)
stat(docroot/index.php)

in my strace output. Yes, in a large app with dozens of includes and thousands of syscalls it won't give you a measurable performance gain, but as your numbers show, it also doesn't slow it down, and there are likely environments where the stat cost is higher and the scripts in question are more minimal where saving that stat call is worthwhile.

But I also don't have really strong feelings on it.

dstogov commented 11 years ago

I optimized and committed the path. It actually may work without strcmp(). Also opened_path is not always the same as path_translated (because of resolved symlinks).

mvrhov commented 11 years ago

I haven't looked on how the tests were performed but, can you make sure that you keep all your cores near 100% busy while you run that test. Running as many scripts as you have cores with a while (true) {} should do just fine.

dstogov commented 11 years ago

of course all the cores were busy.

mvrhov commented 11 years ago

The above idea came from how they measured the stat expensiveness for mod_perl