j-cube / milliways

The storage at the back-end of the Multiverse
Apache License 2.0
7 stars 1 forks source link

Text about Milliways #6

Open pberto opened 8 years ago

pberto commented 8 years ago

So, I need to improve some marketing-technology text on the website that relates to milliways, this is what I have so far. @panta please review:

Home page text (multi-verse.io):

Multiverse was introduced at SIGGRAPH 2015, a technical poster was published at SIGGRAPH ASIA 2015 and a new technical poster about Milliways, a high-performance tree data structure implemented as a pluggable back-end to libgit2, has been submitted for SIGGRAPH ASIA 2016. The technology is released as open source under the permissive Apache License 2.0.

Technology page (multi-verse.io/tech -- this page will be added and will feature a bit of talk and performance comparison):

The Multiverse back-end relies on Git, a powerful distributed source control system. We inherit all the features introduced by Git, including: compact history and branching, natural data de-duplication, cryptographic data integrity, SSH internet sharing protocol and collaborative work capabilities. Our scene data representation allows for punctual access to individual scene elements, opening the door to multi-threaded I/O as well as easy scene updates. To our knowledge, it is the first time that such a set of features is available to the production community. Thanks to a well defined back-end API AbcCoreAbstract, we wrote the AbcCoreGit plug-in that is 100% API-compatible with added functionality for history management. We now describe how we use Git to store data and what “data view” model we use to access 3D scene’s hierarchy.

Data View Model To mirror Alembic scene representation, we use a virtual directory hierarchy on disk. In this representation, geometry and attributes are stored as files at leaf nodes while hierarchy is expressed as directories. We store geometry in binary for compactness. Attributes are stored as JSON files for ease of access and manipulation. Note that this data view is virtualized: it is not visible to the user unless a “checkout” of the Git repository is performed. Such checkout operations can be performed on individual elements of the scene hierarchy to perform manual or scripted modifications — a very handy feature in a production environment.

Data Structure On Disk We rely on libgit2 (the Git portable library, which implements all Git methods) to read and write Git-based scene repositories and to virtualize the data view model described above. The library gives direct access to the repository without going through a “checkout” of the scene description. In other words, we directly write tree, blob and commit objects which are the fundamental building blocks of a Git repository. Note that Git stores data in a directory structure of its own, but this is not to be confused with our data view model.

The Database Thanks to the ability of libgit2 to support pluggable back-ends, we have developed a new high performance open source B+ tree and key-value store C++ library called Milliways -- "the storage at the back-end of the Multiverse. The technology will be presented at SIGGRAPH ASIA 2016 in Macao.

panta commented 8 years ago

I think there is only a minor adjustment:

about Milliways, a high-performance tree data structure implemented as a pluggable back-end to libgit2

about Milliways, a high-performance on-disk tree-based key-value store, used in Multiverse as a pluggable back-end to libgit2

and a note to ourself:

Attributes are stored as JSON files for ease of access

it's correct, but I'm evaluating if as an optimization, when using milliways (and only in that case) we could skip the JSON encoding/decoding and use something faster/binary (JSON is somewhat slow). Then when checking out as classic git, we could convert on-the-fly to JSON (and vice-versa). I'm still not sure if it's worth doing, I need to do some more performance analysis and some more urgent optimizations before.

pberto commented 8 years ago

Thanks for the edits.

JSON, understood, I will write accordingly. Performance and size are the tip-top priority for Milliways. JSON can be always used by converting from milliways to classic git so I don't mind, but focus on the most important optimizations at first. Thanks.