Open archibaldhaddock opened 8 years ago
We can't change the normal ACP(u) API, or Zend ... but for PHP 7 there seems to be a way to avoid copying (some) arrays out of SHM ...
I don't know when/if I'll have time to work on it ... if anyone else wants to have a go, feel free, and comment here ...
Some thoughts ...
It will need to have a new API, suggest ?
void apcu_persist(array $thing);
void apcu_fetch_persistent(array $thing);
It will need to make sure that there are no serialized objects in the array also, an exception should be thrown if the array contains data that cannot be persistent ...
The copy-on-write semantic fetch is not an option (like suggested in the initial post)?
For a new API name, If I'd rather choose something coherent with the existing functions ? like, say : apcu_store_persistent & apcu_fetch_persistent...."Persistent" may or may not be the proper qualifier : the main distinction with the original api is that the stored arrays wont be copied. Cached values are already persistent, aren't they ? Maybe "apcu_store_nocopy/apcu_fetch_nocopy" ?
As we are talking about new features, I was wondering if some kind of control on the unicity of contents could be added (extra parameter, new api ?). If I selfishly get back to my own setting : I have, for example, 10 sites served by the same php-fpm. Each one of them uses 10 different huge config arrays, from $array_1 to $array_10. These arrays are the same for every site ($array_x != $array_y but $array_x for site 1 = $array_x for site 2).
This situation is likely to happen on a lot of mutualised settings.
What if for the same content only one instance is maintened in apc shared memory ? Of course this could be obtained in php code with something like this (please see nxtweb_cache_get_by_content and nxtweb_cache_set_by_content)
<?php
define ("NXTWEB_CACHE_APC", 0);
define ("NXTWEB_CACHE_APCU", 1);
define ("NXTWEB_CACHE_NONE", 2);
define ("NXTWEB_OPTIMISATION", 1);
$nxtweb_cache_backend = ( !defined(NXTWEB_OPTIMISATION) ? 2 : ( function_exists("apc_fetch") ? 0 : ( function_exists("apcu_fetch") ? 1 : 2 ) ) );
function nxtweb_cache_set($key, $value)
{
global $nxtweb_cache_backend;
switch ($nxtweb_cache_backend) {
case NXTWEB_CACHE_APC :
apc_store($key, $value);
break;
case NXTWEB_CACHE_APCU :
apcu_store($key, $value);
break;
default :
return false;
}
return true;
}
function nxtweb_cache_set_by_content($key, $value)
{
//get the md5 of the serialized version of $value
$key_from_content = md5(json_encode($value));
$ret = nxtweb_cache_get($key_from_content);
if (!$ret)
{
if ( !nxtweb_cache_set($key_from_content, $value))
return false;
}
if (!nxtweb_cache_set($key, $key_from_content))
return false;
return true;
}
function nxtweb_cache_get($key)
{
global $nxtweb_cache_backend;
switch ($nxtweb_cache_backend) {
case NXTWEB_CACHE_APC :
return apc_fetch($key);
case NXTWEB_CACHE_APCU :
return apcu_fetch($key);
default :
return false;
}
}
function nxtweb_cache_get_by_content($key)
{
$key_from_content = nxtweb_cache_get($key);
if (!$key_from_content)
return false;
return nxtweb_cache_get($key_from_content);
}
?>
But of course, the control of content could be achieved much faster by the C code of the extension itself.
For PHP7 (with opcache enabled) or HHVM, @archibaldhaddock might be interested in the below link to solve their problem, if writes are infrequent, and all of the classes used are infrequent/ have __set_state implemented:
I'm sort of interested in this, since I have a similar use case of fetching a lot of arrays and scalars from a large config. Currently, it's dealt with by splitting the config up by array key.
If this code was for arrays and scalars only, it might be possible with adding IS_TYPE_IMMUTABLE and removing IS_TYPE_REFCOUNTED. I'm not sure if there are any other type flags to set/unset.
On apcu_store_persistent:
There are going to be race conditions when replacing/removing those values. The permanent array would have to be kept around until the last request to use them was finished.
And this would have to write the equivalent functionality to destroy the arrays in strings, but use the free method for shared memory apc pointers for all of those.
Edit: Shared memory allocation/deallocation routines for persistence, not malloc()
https://wiki.php.net/phpng-int#cell_format_zval
Except for types itself, the engine defines few type flags to uniform handling of different data types with similar behavior.
- IS_TYPE_CONSTANT – the type is a constant (IS_CONSTANT, IS_CONSTANT_AST)
- IS_TYPE_REFCOUNTED – the type is a subject for reference counting (IS_STRING excluding interned srings, IS_ARRAY except for immutable arrays, IS_OBJECT, IS_RESOURCE, IS_REFERENCE). Values for all refcounted types are pointers to corresponding structures having common part (zend_refcounted). It's possible to get this structure using Z_COUNTD() macro or some data from that structure using Z_GC_TYPE(), Z_GC_FLAGS(), G_GC_INFO() and Z_GC_TYPE_INFO(). It's also possible to access reference counter using Z_REFCOUNT(), Z_SET_REFCOUNT(), Z_ADDREF() and Z_DELREF() macros.
- IS_TYPE_COLLECTABLE – the type may be a root of unreferenced cycle and it's a subject for Garbage Collection (IS_ARRAY, IS_OBJECT).
- IS_TYPE_COPYABLE – the type has to be duplicated using zval_copy_ctor() on assignment or copy on write (IS_STRING excluding interned strings, IS_ARRAY)
- IS_TYPE_IMMUTABLE - the type can't be changed directly, but may be copied on write. Used by immutable arrays to avoid unnecessary array duplication.
Thank you TysonAndre for your comments !
I've tried the "include way of the force" a few months ago because I thought that serializing/unserializing was a big part of the problem. Apcu_fetch proved to be better...
But, on which PHP version ? with opcache enabled ? (if not, what I eventually compared was unserializing and parsing) I can't remember so I'm going to test that again on php7 + opcache. If it's ok, what about generalizing this approach to all type of configuration ? Along with the arrays produced from xml files, I'm thinking about tons of parameters that are stored in a mysql database and read on every request...
The new behaviour would be "1) execute a mysql request to test if the parameters table has changed (update_time from information_schema.tables) 2) yes ? produce the corresponding php config file, no ? do nothing 3) always include php config file Of course, opcache must be configured so that it can detect immediatly the changes in the php parameters file...
Thanks again !
Also, if running multiple PHP processes(not threads) on the same server, file based opcache might be worth looking into, but not necessary. It's a configure flag for PHP (Not as only cache).
Opcache memory limit may need to be increased.
If the configs are expected to be small or scalars, APC would probably be better.
There are probably going to be a lot of race conditions. The final file name should be based on the db modification time. A temporary file with a globally unique name should be created, on the same disk partition, written to, and rename()d to the final file(file writes aren't atomic). (And the last modification date cached in APC) Even after that, there are still probably race conditions.
The modification time might change between 1 and 2, for example.
I was recommending it for unchanging (versioned?) configs(v1 never changes, v2 never changes, etc.). If the configs change frequently, the implemented opcache solution is probably going to have more bugs.
Since PHP 7 we have in principle the possibility of directly accessing strings and arrays from SHM without copying them into request memory. The big problem is that in this case we have to ensure that the data remains in SHM until there is nobody left using it.
Opcache solves this problem by never removing anything from the cache until an opcache restart is triggered, in which case it waits until all processes have disconnected form opcache and then kills hanging processes. This is known to be problematic both because it causes an interval of time (while the restart is in progress) where PHP runs without caching, which can cause large load spike, and furthermore it causes instability in the cases where opcache ends up having to kill processes.
What would be relative simple to implement is a mechanism that stores something into apcu forever and which can then be used without copying. But I don't know how to make this work with storage that is supposed to be freed again, while also being robust to abnormal conditions (such as crashed worker processes).
@nikic I noticed a recent release changed the serializer to default
. I haven't dug into how much the serializer changed, but I'm hoping that maybe this means the fast-access SHM idea you described above became reality. Can you confirm this?
I'm looking forward to adopting this at Wikipedia (via MediaWiki) and reporting on the positive impact it would have on latencies. There's a number of large values we fetch from APCu on all request, and this area took a noticeble perf hit when we switched from HHVM back to Zend PHP. I understand why and we've mostly regained that in other areas, but you can imagine my excitement at the possibility of having both LRU GC and super fast access. ♥️
EDIT: I see the default has since switched back again to php
, so I guess we'll hold off for now. But I'd still be interested to know whether or not the default serializer did indeed evolve in the above direction!
@Krinkle Nope, the default serializer still copies values. They're stored in a format that could be directly used if we figured out how to manage invalidation, but currently a copy is still required.
The recommendation for caching large, mostly static data remains opcache.
I mean - you could definitely pretty easily implement usage of opcache for the data you want, but yeah invalidation (eg restart every 24 hrs) is then needed.
That said you could easily add a two-tier caching system, where you implement your own LRU by using /dev/shm/cache/bin/cid.php.
The advantage of that is that you can remove items that have not been accessed and hence they would not be reloaded via include_once into SHM after the 24 hrs reset, but for those that are they are loaded super-fast from shared memory file system into opcache, so your opcache rebuilds really fast.
In fact I would probably just add a counter SHM file next to the cache entry. And only load those via require_once into opcache that are needed.
What would be relative simple to implement is a mechanism that stores something into apcu forever and which can then be used without copying. But I don't know how to make this work with storage that is supposed to be freed again, while also being robust to abnormal conditions (such as crashed worker processes).
Yeah, if that were to be done in APCu, having a separate memory area with separate memory limits for data with no expiry that couldn't be deleted would seem more appropriate, but would still have the issues you mentioned. Though at that point, it might make sense to instead
What kind of abnormal conditions/crashes have been seen with opcache? Worker processes that fail to exit while holding a lock?
Possible workarounds:
Integrate into opcache directly as opcache_store_immutable_value(string $key, array|string|float|int|bool|null $value): int
and opcache_get_immutable_value(string $key, bool &$success = false): array|string|float|int|bool|null $value
- This seems like it'd have somewhat lower memory usage for arrays without serialized objects because it could reuse interned strings more effectively
Though I don't think a majority of people would have this use case, it'd compete with memory used for compiled files, and the hackish workaround of file_put_contents(fileNameForKey, var_export(..., true))
exists and works with opcache's file cache as well (as mentioned in https://web.archive.org/web/20170608185145/https://blog.graphiq.com/500x-faster-caching-than-redis-memcache-apc-in-php-hhvm-dcd26e8447ad?gi=1c30f8eb7302 )
It will need to have a new API, suggest ?
void apcu_persist(array $thing); void apcu_fetch_persistent(array $thing);
It will need to make sure that there are no serialized objects in the array also, an exception should be thrown if the array contains data that cannot be persistent ...
so objects, PHP references, and resources?
https://www.npopov.com/2021/10/13/How-opcache-works.html - your post cleared up what opcache is doing - I'd assume apcu is doing something similar.
*nix
, the module init (MINIT) is called once when the Apache server is started up and choses a working backend (e.g. the mmap backend allocates MAP_SHARED for memory that is shared among all forked processes (plus other locking in ext/opcache/zend_shared_alloc.c for locking access to that memory)I'm not actively working on alternatives right now, I'd just decided to research how apcu_fetch worked for string data, and whether it'd be constant time or take additional time/memory.
It turns out it'd take additional time/memory even for plain strings (a new copy is allocated for every call to apcu_fetch, which can be partly worked around by adding a pure php wrapper which calls array_key_exists and uses the original value if it exists before calling apcu_fetch)
static zend_string *apc_unpersist_zstr(apc_unpersist_context_t *ctxt, const zend_string *orig_str) {
zend_string *str = zend_string_init(ZSTR_VAL(orig_str), ZSTR_LEN(orig_str), 0);
ZSTR_H(str) = ZSTR_H(orig_str);
apc_unpersist_add_already_copied(ctxt, orig_str, str);
return str;
}
The performance for unserializing large arrays with many refcoonted(string/array/PHP reference) values may be improved somewhat by the optimizations mentioned in https://github.com/krakjoe/apcu/issues/323#issuecomment-1288117298 (especially for avoiding the excessive hash table collisions caused by the way pointers are used as indexes in the hash table) - I haven't benchmarked this.
Also, are the maintainers still interested in PRs for adding support for persistent values? I'm looking into whether that's possible.
Another note: If persistent storage APIs were added to APCu that couldn't be cleared, then I belive values in temporary storage could safely use the strings in shared memory (on OSes where shared memory has the same address for all processes attached to shared storage) if those zend_string pointers (or equivalent strings, similar to the way opcache's interned strings storage currently works) were already in APCu's persistent shared storage for other reasons (e.g. keys/values in configs)
It will need to have a new API, suggest ?
Also, are the maintainers still interested in PRs for adding support for persistent values? I'm looking into whether that's possible.
I'd put together a prototype of this. Unit tests pass, but there's no real-world users and there aren't many tests of php-fpm in apcu/immutable_cache, either. This stores immutable arrays and strings (for arrays containing no objects or references) in shared memory, so retrieval is constant time when the serializer/unserializer can be avoided
https://github.com/TysonAndre/immutable_cache-pecl https://github.com/TysonAndre/immutable_cache-pecl#benchmarks
Miscellaneous thoughts on the API design:
Naming thoughts:
APCu\immutable_cache_add
and APCu\immutable_cache_fetch
Other thoughts for longer-term improvements:
Only the data that gets returned through fetch methods as PHP values needs to be immutable. So it might be possible to have APCu\SharedMemoryArray
entries which wrap a PHP array, and the entry can have its own read/write lock (and allocate a new region for mutable array data when it grows), and the values from that can be mutable instead of immutable, being copied when individual entries are read.
(That could be used for userland to check for membership in a large array that occasionally changes without unserializing the entire array, e.g. if there is a array with millions of integer keys that infrequently changes)
(only the SharedMemoryArray's entry existing for a key would be immutable, the data it contains would be mutable and independently serialized/unserialized)
Also, are the maintainers still interested in PRs for adding support for persistent values? I'm looking into whether that's possible.
I'd put together a prototype of this. Unit tests pass, but there's no real-world users and there aren't many tests of php-fpm in apcu/immutable_cache, either. This stores immutable arrays and strings (for arrays containing no objects or references) in shared memory, so retrieval is constant time when the serializer/unserializer can be avoided
I'm planning on putting immutable_cache
on PECL in 7 days there are no objections. https://marc.info/?l=pecl-dev&m=166838234522037&w=2 - separately, if any APCu leads want to join as co-leads/maintainers I could add them.
There's multiple leads of APCu, so I'm not sure who to ask. https://pecl.php.net/package/APCu
Bug reporting/investigation would be easier if users could install/uninstall immutable caching separately - Modifying a large existing project with concurrency/serialization/unserialization/locking/allocators can be error prone - if users can reproduce issues they encounter even without immutable_cache/APCu then it'd be easier to debug.
This looks awesome @TysonAndre. In one of my current projects, the use case is 99% immutable_cache
and 1% apcu
, so I'll be looking at this for sure.
Hello everybody !
I use apcu to store big arrays that cost a lot to produce from xml config files. These arrays are pure configuration data : my source only use them in a ready-only way. I suppose that this is a very common use of the apcu extension.
Moreover the content of these big arrays is the same for a lot of sites served by the same php-fpm process. Which means that where I could have only one instance foreach configuration array I've got hundreds...
My questions are :
1) would not it be great to implement a additionnal kind of apc_fetch that would not return a copy of the cached values but a reference to them ? (Strict read-only use of the returned references would be the coder responsability) Looking into the apcu code, would it be enough to duplicate apcu_fetch to apcu_fetch_read_only and return zval from apc_cache_find instead of the deep-copied zval ?
2) Another way to avoid the current useless copy ( and allocation ) of tons of data would be to return something (whatever the inner zend implementation) that would follow the usual copy-on-write behaviour.
$a and $b being arrays, $a = $b won't deep-copy $b into $a. Why should $a = apcu_fetch($key); ?
I post a similar question on HHVM github : https://github.com/facebook/hhvm/issues/5167 And I have experimented (with xhgui) that their answer is correct : HHVM actually implements option 2)
Regards