Open jfischer opened 12 years ago
Cloud storage application:
The main components of the app are:
RPC block uses the XMLRPC infrastructure that I setup previously. It provides the following API:
Replicated data store is a shard with a number of stores under its control. Each store itself can be comprised of different nodes under a zone. Replication ensures that a file is spread across several zones and it is up to the administrator to ensure that the zones are physically separate. Individual stores can be de-duplicating as well.
Given an input file (given by its URL) to save, replicated store (RS) first ensures that it is saved in one of the stores (on which there are no pending operations on that file) and returns a handle (integer) corresponding to the file to the clients for future operations on it. Once it is stored, it gets the store URL of the file. It queues this URL for replication - it can use redis for this. It dequeues elements from this queue periodically and for every file, it ensures that at least 3 stores have it. This will ensure that files get replicated eventually.
If a URL is deleted, it deletes it from one store and queues the delete action on other stores on which it is replicated.
Replicated metadata store (RMS) works similarly, except it operates on database records and not files. It is also a shard with metadata stores (these are usual mongodb instances). Given an input record, it immediately saves it on 3 metadata stores. This will finish fast if the records are small. Updating all 3 stores at the same time keeps all databases in sync and allows aggregates and other queries to be run interchangeably on all databases. RMS is used to store user information, especially the list of containers present in the user's account along with their handles (integer identifiers) to the RS.
So the workflow for getting a file (F) in a container (C) of a user (U) is as follows:
Similar workflows apply to adding and removing files.
(If we need to be API compatible with Rackspace then the RPC block will need to be changed accordingly.)
Jeff to review the Rackspace storage API to determine what we should implement in first pass.
Sai to define block topology that would be required to implement the demo.
Questions for Sai to answer: