Volatile state for persistent objects.

igchor commented 5 years ago

One possible approach is to have singleton class for every persistent object type which would hold hashmap of volatile objects (volatile state) for every persistent one. This would require support from pmemobj - registering callback which would be called after redo log creation but before processing it (under lock?). This callback would take pointer to object which is destroyed and allow freeing corresponding volatile object.

igchor commented 5 years ago

Also, to make V more predictable we can call value destructor in V destructor and zero pmvlt (that way, on next get we will reinitialize value)

igchor commented 5 years ago

In current V, there is one more problem: if V is used inside of union with some persistent data, pmemcheck will not report errors (pmem mapping is removed inside pmemobj_volatile)

seghcder commented 4 years ago

I've noticed v<> is still in the experimental namespace. Are there still some fundamental concerns or is the current implementation relatively stable?

igchor commented 4 years ago

The current implementation should work fine for trivial types for which destructors do not have to be called (like int). However, v<> should not be used for managing any resources (for example volatile memory). The reason is that it's hard to make it work with transactions.

We created alternative approach of volatile state for pmem (currently as an internal feature): https://github.com/pmem/libpmemobj-cpp/blob/master/include/libpmemobj%2B%2B/detail/volatile_state.hpp This is basically just a global hash map which you can use to map pmem object (its oid) to some volatile state. It uses transaction callbacks to make sure the destructors are called at right time, so it can be used to manage volatile resources.

What would you like to use v<> for? I'm not sure it will ever come out of experimental but if you believe the volatile_state structure is useful we can think about moving it to a public interface.

seghcder commented 4 years ago

I am using it for shared pointer to an object :-).

In order to work around the polymorphism issue with persistent objects, I'm experimenting with an ECS-style approach as follows:

A Component is a Persistent Object which is a regular struct with member variables but no member functions. One of the members is v<shared_ptr>
This PObject is managed "within" a regular C++ volatile class Object (VObject), derived from a common base, IComponent.
The pool is a pmem::obj::concurrent_hash_map of <long, PObject>.
In the VObject constructor, it looks for the PObject of itself (by entity id) and "attaches" itself to it if it exists. If it doesn't exist, it creates the PObject and puts it into the pmap (@marcinslusarz - the pmap and the PObject may be in different pool files here - ideally this is a single txn).
A Component Handler then adds the shared pointer into the v. Following this any reference to the Component returns the vObject shared pointer from v<>.

I can now create a vector of multiple Component Handlers of different Component types (since they inherit from IComponent) and manage them in a single Component Manager, but with the storage in Persistent Memory. In code, I can now do entity->GetComponent<TimeSeries>()->addTimeSeriesEntry(…) or entity->GetComponent<FieldSet>()->GetField("address").

Also, the implementation of the Component storage can be changed without affecting the rest of the components.

In the VObject, there's no state stored apart from pointers. I started with a separate map structure in memory, however since the VObject is attached the PObject it represents it's easier to keep it all in sync (its already a mini-version of Inception trying to keep track of which layer one is in).

The current approach seems to work but always open to looking at improvements esp if they reduce the overhead in maintaining the above…

@marcinslusarz @pbalcer some more background on my Component approach...

igchor commented 4 years ago

That's sounds pretty interesting. Do you have any code examples?

I cannot think of any better solution which would not require synchronization between volatile and persistent state. What can be problematic in your current approach is managing lifetime of vObjects. When do you free it? Is there some manager resposible for that?

seghcder commented 4 years ago

Happy to go through some examples via screen share sometime if interested :-)

Yes, each component type is managed by a templated Component Handler. The destructor of the handler also handles the destruction of the v objects. At the moment I iterate the pmap of components and reset the v if set. Removal of the pComponent must also be done via the handler.

In backlog is using policies/traits to control if the handler should use a hashmap (for sparse components, ie not linked to many entities) or vector* (for high-use components), or if "preloading" should be done etc. For sparse components, it might be better to maintain something like a tbb::concurrent_vector of which components have been linked, and just iterate that in the destructor.

* Edit: Or the new concurrent skip list...

seghcder commented 4 years ago

FYI - also using ::v to hold volatile runtime context (ctx). Eg pointers to logging, configuration etc. This is partially to avoid singletons / global variables. Means we can do this within the persistent object code:

ctx->log->debug("HelloWorld Persistent Constructor called. App name is {}", ctx->config->app_name);

Would be keen to see this interface go public, however also open to alternatives.

igchor commented 4 years ago

I guess we could expose v as a stable interfece with a requirement that you cannot call get() inside a transaction. Would this restriction be problematic to you?

seghcder commented 4 years ago

The restriction is fine. Anything needed could be collected before the txn.

I'm still also open to alternatives if you think this is a compromise. I do like that you can directly link the vmem to the object. However, we have to manually manage the application shutdown (normal and abnormal if possible) as well as pmem object deletion (in destructor).

The approach you describe in the first comment would be more RAII-compliant / exception-safe I suspect. It would have the hit of doing the hashtable lookup to get to the underlying data. Also, could it support more than one volatile state member variable per pmem object (like ::v can)?.

seghcder commented 2 years ago

Just a bump on this topic... would be good to decide if to move ::v from experimental, and/or move detail/volatile_state up to API level?

I like experiemental::v as it sits within the context of the persistent class, but don’t like that it doesn’t do automatic object lifecycle management / RAII.

Or perhaps there's another smart idea on how to keep both together in the same object but also allow the volatile object to be auto destroyed during the persistent object destruction. A challenge is how to destroy the volatile object during a "regular" shutdown. In effect we have two lifecycles occurring... pmem and vmem..

igchor commented 2 years ago

To be honest, this issue became a low priority. @seghcder Is this very critical for you? If yes, we can think about moving the v<> to the public API and making a release of the library. We won't be able to extend its functionality nor implement anything more intelligent, unfortunately.

seghcder commented 2 years ago

Reasonably - an implementation using this is going live in the next 2 months and I'd like to resolve some of our experimental libraries & API usage. It's not just in libpmemobj++ either - Unity3D had several experimental libs we rely on but thankfully those have also gone GA recently.

Also, to make V more predictable we can call value destructor in V destructor and zero pmvlt (that way, on next get we will reinitialize value)

I assume this didn't get implemented and we need to manage the destruction?

If volatile_state also serves a useful purpose and supports RAII, it might be good to bring that up to the official API level too? One potential issue is I think it doesn't support full parallel use (uses locking?), but that might be a good future use case for a tbb concurrent map? (I now realise C++20 only includes some tbb features it seems).

igchor commented 2 years ago

Yes, it didn't get implemented but I'll take a look at that see if we would be able to do it in the next 2 months. The main problem with the V destructor as I remember was that if an object's destructor is called inside of a transaction and this transaction is aborted we will end up with valid pmem object but invalid dram state (and using txs to free objects is our recommendation). If volatile state can be just recreated on the next get() that's probably OK but we'd need to come up with a set of requirements: I guess the volatile state should be const (to avoid losing any updates).

What do you think about this? Do you see any more problems with such an approach? Is constness guarantee OK for you?

As for volatile_state, I agree, we can expose it as public API: it is pretty well tested already. And yes, it uses locking - to achieve concurrency we would need to add some dependency on a concurrent data structure.

igchor commented 2 years ago

Also, the volatile_state can be used to store multiple volatile states per pmem object: for example you can add some placeholders into the pmem object and then use PMEMoids which would point to those placeholders.

seghcder commented 2 years ago

For ::v its starting to sound "inelegant"? :-)

for example you can add some placeholders into the pmem object

Ok interesting. One could have an empty class named vstateplaceholder so it's clear it's used for vstate, as long as it gets its own PMEMoid for the volatile_state map.

If I understand correctly, the assumption for volatile_state is the type has to be default constructible (here?). You'd need a second step to then init a member variable with some parameters if needed, and something to make sure nothing gets in before then.

Maybe let me take some time to look at volatile_state and see if there's any issues for us, and then we can revisit ::v if its needed.

There's also the option for us to make our own version of volatile_state I think... there's nothing deeper in the library that's directly hooking into volatile_state? We just need to remember to add the destructor boilerplate to our classes.

igchor commented 2 years ago

We do not depend on volatile_state anywhere in the library code anymore (we did when we developed persistent TLS: initially. However, later we get rid of this to achieve better scalability. If you'd like to implement you own volatile_state that's also fine but you should think about what should happen to the vmem objects on pool close. In our implementation we register a handler which will iterate over all items stored in the volatile_state map and destroy them: https://github.com/pmem/libpmemobj-cpp/blob/7e59f08f65d6d67cd893f4c6b7c1141a04f28ac3/include/libpmemobj%2B%2B/detail/volatile_state.hpp#L97-L99

If I understand correctly, the assumption for volatile_state is the type has to be default constructible (here?).

Yes, but this can be changed, similarly as for v: we can add variadic arguments or accept some ctor function.

Maybe let me take some time to look at volatile_state and see if there's any issues for us, and then we can revisit ::v if its needed.

That sounds great! Please let me know if you have any ideas how to improve it.

Also one clarification: right now, the volatile_state takes write lock only when item has to be created (first call to get). Otherwise, it;s using read lock which provides some concurrency.

pmem / libpmemobj-cpp

Volatile state for persistent objects. #363