Design for Cloud Storage Demo App

Cloud storage application:

The main components of the app are:

RPC which provides the API for users
Replicated data store (RS)
Replicated metadata store (RMS)

RPC block uses the XMLRPC infrastructure that I setup previously. It provides the following API:

Add/remove containers (they are one level file directories)
Add/remove files in a container (no modifying files in the initial implementation)

Replicated data store is a shard with a number of stores under its control. Each store itself can be comprised of different nodes under a zone. Replication ensures that a file is spread across several zones and it is up to the administrator to ensure that the zones are physically separate. Individual stores can be de-duplicating as well.

Given an input file (given by its URL) to save, replicated store (RS) first ensures that it is saved in one of the stores (on which there are no pending operations on that file) and returns a handle (integer) corresponding to the file to the clients for future operations on it. Once it is stored, it gets the store URL of the file. It queues this URL for replication - it can use redis for this. It dequeues elements from this queue periodically and for every file, it ensures that at least 3 stores have it. This will ensure that files get replicated eventually.

If a URL is deleted, it deletes it from one store and queues the delete action on other stores on which it is replicated.

Replicated metadata store (RMS) works similarly, except it operates on database records and not files. It is also a shard with metadata stores (these are usual mongodb instances). Given an input record, it immediately saves it on 3 metadata stores. This will finish fast if the records are small. Updating all 3 stores at the same time keeps all databases in sync and allows aggregates and other queries to be run interchangeably on all databases. RMS is used to store user information, especially the list of containers present in the user's account along with their handles (integer identifiers) to the RS.

So the workflow for getting a file (F) in a container (C) of a user (U) is as follows:

User U sends the tuple (Fetch, C, F) to the RPC block
RPC block checks with RMS to see if U has a container C and if so gets its handle
Uses this handle to fetch the list of files along with their handles present in C stored in RS
Gets the handle corresponding to F and gets a URL for the file
Returns the URL to the client who can do a HTTP GET on it to download the file

Similar workflows apply to adding and removing files.

New blocks: All the above 3 components are new. But the replicated store can be reused in other examples.
New infrastructure: None - since we already have the XMLRPC setup.
Different configurations - we only need to support add and remove files initially. Initially we can also assume that there are a fixed number of users who we initialize with empty containers. We can later add support to add/remove users as well.
Evaluation: TBD

(If we need to be API compatible with Rackspace then the RPC block will need to be changed accordingly.)

mpi-sws-rse / datablox

Design for Cloud Storage Demo App #27