Closed Ocramius closed 7 months ago
PCRE already caches the compiled and optimized regular expressions when they are first met during runtime. While it should be possible to introduce new opcodes to avoid the function call to the pcre_*()
functions, this may not yield a relevant performance increase (and might actually cause a general performance penalty due to VM size increase). But the real problem is that regexs may depend on the locale, which is a runtime concept, so neither OPcache nor the engine optimizer wouldn't really able to optimize. We had this very problem with constant float values prior to PHP 8.0.0, which where optimized by OPcache, but didn't regard the locale; this issue has been solved by making float to string conversion locale independent.
What remains is that the PCRE cache is stored in a per-process/-thread global, and that might be moved to SHM; that wouldn't make a difference for the typical case when using FPM.
But the real problem is that regexs may depend on the locale
:scream:
PCRE cache is stored in a per-process/-thread global, and that might be moved to SHM
If locale changes at runtime, would that require a per-locale cache?
that wouldn't make a difference for the typical case when using FPM.
Mostly interested in cold starts (think AWS lambda, for example), as well as optimizing a lot ahead of time (tight loops, yet regex perhaps not optimized as much as it could be)
PCRE already caches the compiled and optimized regular expressions when they are first met during runtime.
I should be possible to compile regexes for constant/known strings during compile/opcaching stage, no need for a new opcode. But the question is, is that wanted? If regex compilation takes a lot of time, then this would reduce the performance in apps they do not need all regexes.
So we should keep on compilation on runtime probably, but cache this compilation result globally. Maybe done already, not sure.
But the question is, is that wanted?
Few advantages of AOT compiling+caching:
faster cold-starts (binaries, serverless)
So we should keep on compilation on runtime probably, but cache this compilation result globally.
Yeah, the returns are very minor, in this case: caching already happens per-thread :D
That's what opcache ini settings are for 😁
As long as the regex cache is stored across requests, I do not think this can help with the performance, especially when not all regexes are used. I therefore propose to close this issue.
Description
While looking at generated opcodes for some PHP source, I noticed that regular expressions are always sent to ext-pcre via
INIT_FCALL
+SEND_VAL
:I was wondering if, when:
pcre_*()
one)... it could be possible to:
Note: I don't have any idea of how heavy this is, or whether JIT already takes care of this.