olson-sean-k / plexus

Polygonal mesh processing.
https://plexus.rs
MIT License
162 stars 14 forks source link

Use transactional storage to enable mutation recovery. #19

Closed olson-sean-k closed 4 years ago

olson-sean-k commented 6 years ago

When the mutation API is used to modify a mesh, commit failures are catastrophic: the entire mesh is lost. Today, the results of these mutations are unwrapped, so not only is the mesh lost, but the thread panics.

The mutation API allows a replacement to be used when modifying through a mutable reference. To make these errors recoverable, a clone of the mesh could be used as a replacement, but this could be prohibitively expensive.

Instead, consider using transactional collections for storage. Such a collection would track modifications, present a view of those modifications, and allow them to either be committed or rolled back. If a topological error is detected by the mutation API, then storage can be rolled back and the original mesh can be recovered without too high a cost. It may be possible to refactor the transactional behavior of mutations and share it with storage.

olson-sean-k commented 6 years ago

I've been working on a crate that should help solve this issue as well as #17. It provides journaled collections, which behave in a similar manner to mutations in Plexus. For example:

let mut hash = HashMap::new();
hash.insert("hello", "world".to_owned());
let mut hash = hash.journal(); // Begin transaction.
hash.insert("foo", "bar".to_owned());
for mut entry in hash.iter_mut() {
    entry.to_mut().1 = "modified".to_owned();
}
let hash = hash.revert(); // End transaction.

Adapters also provide a common iterator type per collection and optionally unify types. This allows for both journaled and unjournaled collections to implement traits. If used for storage in Plexus, then the mutation API can enable journaling and then commit or revert the log depending on whether or not a mesh mutation was successful.

olson-sean-k commented 5 years ago

This would be a significantly different approach from the current mutation API. Like other issues related to storage, this would be greatly improved once GATs land in stable Rust. I will probably wait until that happens before exploring this any further.

olson-sean-k commented 5 years ago

Another thought. This is only useful if:

  1. the public graph APIs allow invalid mutations to occur
  2. the mutation API is exposed to users

I don't think that [1] should be allowed (keeping in mind that these mutations are topological). The real value of this model is if [2] ever happens. As the mutation API is implemented today, users would either have to clone graphs or accept total loss when a mutation fails.

olson-sean-k commented 4 years ago

I'm closing this issue for now. It should be possible to create some kind of transactional storage and use it during mutations, but to accomplish this with the AsStorage trait and network module, a trait object and dynamic dispatch would be required for any and all storage access. For example:

pub trait AsStorage<E>
where
    E: Entity,
{
    fn as_storage(&self) -> &dyn Storage<E>;
}

pub trait Storage<E>: Get<E> + ...
where
    E: Entity,
{}

Moreover, transactional storage could be expensive. Mutable gets and especially mutable iterators are very expensive to journal without drastically changing storage APIs. Rekeying is required to keep a log and apply it to slot maps, because they generate opaque keys (this would also interact poorly with storage APIs).

I think the mutation API should remain internal and all public APIs should disallow any inconsistent topological mutations. Discarding the mesh is fine, as it should always lead to a panic, and such panics indicate internal bugs.

It's also worth noting that geometric errors should not be allowed to occur within the mutation API, as they should be recoverable and not lead to panics or inconsistent topology. Unfortunately, some recent changes work against this goal by pushing geometric computation behind the mutation API. That will need to be fixed (cache types may need to perform geometric computation and store the results).

olson-sean-k commented 4 years ago

I'm reopening this in the hopes it can be solved. I may be rehashing some thoughts that have already been stated, but here we go anyway. :sweat_smile:

I think the mutation API should remain internal...

I've changed my mind about this. The mutations exposed by view types (for example, FaceView::poke_with_offset) and any other such APIs should guarantee consistency. Moreover, these mutations should not require consistency checks except during development and testing. The mutation API can be exposed and its public APIs should always perform consistency checks.

To accomplish this, commits must generally check consistency except for an internal API that does not (depending on configuration, such as #[cfg(test)]). The entity module (previously network module) must also be refactored to allow polymorphic storage (journaled vs. unjournaled). I've sketched out some of this in this gist. This introduces dynamic dispatch, but I think that's a small price to pay for recoverable transactions.

Journaling slot maps presents problems. The ugly solution is to synthesize keys by examining the slot map's capacity and generating KeyData using unsafe code (mem::transmute). I'm not happy with that approach, but it may be necessary along with pinning to a specific version of slotmap, as I doubt the maintainers will accept a PR for exposing internal state for key synthesis. In the extreme, I can also fork.

I'm hoping to play with this on a branch and update this issue accordingly.

olson-sean-k commented 4 years ago

Unfortunately, it looks like this may be best solved with GATs (which also help avoid boxing iterators in the storage APIs). An implementation without GATs will require the 'static lifetime for any and all user data in a graph, because there is no way to parameterize the lifetime of trait objects w.r.t to borrows of self in the AsStorage trait. By default, trait objects are 'static, and this punts on trying to use more granular lifetimes. In a nutshell, it would no longer be possible to put references in a MeshGraph. This gist currently demonstrates a minimal API using GATs.

olson-sean-k commented 4 years ago

The storage branch (7b762c3) is passing CI with an implementation using trait objects and feature gating that uses GATs on nightly toolchains and require's 'static data on non-nightly toolchains. This is missing the journaling mechanism, but has reached the first step for a workable implementation. :tada:

olson-sean-k commented 4 years ago

Here is an example of how gnarly key synthesis is. That code synthesizes slot map keys. Not only does this require unsafe code, but any synthesized keys will be invalid after committing the log (synthesizing the keys that will be used after commit is probably too difficult). Committing the log will need to return a mapping of synthetic keys to keys written to storage.

This is gross, but I'll continue to implement this and see where it goes.

olson-sean-k commented 4 years ago

I've been working on this for the past month or so and have ultimately come to the same conclusion: this probably shouldn't be done. Here's an important question: is a low level public mutation API actually useful? I don't think it is. A rough analogy would be a linked list data structure that exposes a transactional API that lets users manipulate nodes and their links and then perform a checked commit. :confused:

I think the approach of exposing a higher level and infallible API is a good one. A transactional mutation API is still useful internally, even if it destroys a graph when it becomes inconsistent. If a failure occurs in a higher level API, then it's a bug and Plexus should panic by expecting output from the mutation API. That being said, that behavior is good for testing but should probably be bypassed otherwise. If a high level mutation is well tested and consistent, then it should probably not require consistency checks at runtime whenever it's performed. It's also worth considering the builder API, which mirrors the behavior of the mutation API and can throw away graphs that are inconsistent.

Importantly, exposing a "safe" mutation API with transactions and restorative failure modes involves compromises. The most notable compromises can be summarized as code complexity and performance. I'm very close to a complete transactional API at the cost of dynamic dispatch, rekeying, and significantly more complex storage and mutation APIs. There is also some abuse of the slotmap crate, which requires misusing its FFI API and pinning its dependency to a specific version to synthesize keys. :vomiting_face:

This work hasn't been a waste though. Many changes on the storage branch are generally useful and I plan to either backport them or possibly complete and land the changes and then "undo" transactional mutations. For example:

  1. Entity types like Vertex and Face are no longer exposed in the public API and instead more typical get and get_mut functions are used to read and write user data in a graph.
  2. Orphan views no longer expose entities and instead only expose their user data.
  3. The storage APIs do not expose mutable iterators over entities but instead only their user data.
  4. The transaction APIs are much more flexible and expose conditional APIs for bypassing or committing.
  5. A general GraphKey has been introduced.
  6. A build script introduces #[cfg(nightly)] which can be combined with #[cfg(feature = "unstable")] to conditionally enable unstable features like GATs.

And there are more. Removing restorative transactions allows for completely static dispatch, static iterators in traits (when enabling GATs), and much less complexity. Some features are useful but still incur a tradeoff. For example, dynamic dispatch to Storage trait objects may hurt performance (I haven't measured), but allows for tricks like tracking mutated keys and only checking the consistency of changes (rather than the entire graph, which may be huge). That may only benefit tests, however. Tricky!

olson-sean-k commented 4 years ago

The storage branch has been rebased to squash a large number of commits and merged into master. I've begun removing journaling and implementing conditional commits. See c011a8a and 0451869 for a glimpse at what recoverable transactions with journaling could look like. From here on, I plan to abandon that approach as discussed here. Phew!