linkedin / ambry

Distributed object store
https://github.com/linkedin/ambry/wiki
Apache License 2.0
1.75k stars 275 forks source link

Possibilities for using Ambry as a more "generic" cloud storage #555

Open siepkes opened 7 years ago

siepkes commented 7 years ago

I was looking in to the possibility of using Ambry as a generic cloud storage solution, basically an S3 replacement. Being a dev myself I'm not afraid to get my hands dirty to implement storage implementations with Ambry's API for various applications like Phabricator, Discourse, XWiki, Open-Xchange, etc.

However I bumped into the issue that all applications I looked into assume that they can control the naming schema of the blob they are storing. Now obviously that is not something that is un-fixable; All of these solutions are opensource and could be made compatible with storage backends which don't let them choose their own "filenames" (blob names would probably be more appropriate). However that would mean getting API changes in to all kinds of upstream projects, which is usually a lot more work then just providing a new storage implementation class (all the projects I encountered already have an abstract concept of storage).

So I was wondering how realistic it would be to implement something in Ambry that would for example create a namespace per blob owner for unique blob names.

An alternative would be to create a totally separate database or gateway in front of Ambry in which a sort of virtual namespace is created. However that would create an additional database to backup, maintain, etc. It would be neat if that data could somehow be stored in Ambry itself to ease the maintenance burden of Amrby.

pnarayanan commented 7 years ago

All good points, @siepkes. A key-value API for Ambry is certainly useful and this is something that has come up before.

Self generation of handles keeps things simple internally, particularly around preventing key collisions. That said, although the ids or "handles" are generated at the router today, the datanodes to which they are sent have a key-value API. It should not be too hard to provide an API for letting the client pass in an id/key/filename for the object that is uploaded. The main question to answer is how key collisions should be handled or prevented. Basically, what happens if two requests come in to upload objects with the same key and they go to different nodes and are close enough in time. Since only a few datanodes are contacted in the synchronous path, both requests could succeed and the duplication might only get detected during the async replication, by when it would be too late to "correctly" resolve it. There is also the question of whether the system should continue to be an immutable store - whether the key-value PUT API should support updates of an existing key. Once we go down this path, we have fundamental problems to deal with, such as handling versioning for a key and figuring out which one is the latest and so on, those that other general purpose key-value stores deal with.

Perhaps the simplest way to achieve this is to continue to be immutable and put the burden of uniqueness on the client itself. You bring up an interesting point about creating a namespace around the service owner itself. If the service owner field is embedded into the id (which can be easily done by the router), then the service owner will only have to ensure uniqueness of keys that it generates.

siepkes commented 7 years ago

That's an interesting problem you point out @pnarayanan . I think the burden of uniqueness is, as you proposed, one the client should solve. If I understand it correctly that would mean that the client doesn't receive an error when such a situation occurs? I think that's acceptable given that the client (application) has the best means to prevent that. Combined with a namespace solution (for example the owner) that would enable the client from preventing such a situation (since an application can only guarantee uniqueness within its own domain). Also I think if the client really doesn't want to solve that problem it should support letting the backend choose the name (ie. Use the ambry ID).

Mutable data is also an interesting problem. Most applications I've seen who offer support for using cloud storage like S3 don't modify parts of files. Basically treat everything as immutable, they just delete and upload blobs. There will probably be some applications who do need that functionality but I don't think they are the majority (I must admit I have no numbers to back that claim up). I do think that mutability will probably be a feature that might be desirable because people (and some applications) might expect it. However for adoption as a more generic cloud storage backend I think the naming problem is a bigger hindrance.

Another thing applications sometimes expect is the ability to read data with an offset. It's also something that's not super hard to workaround; Download file to temp location and then set offset.

Thanks for your valuable insights!

pnarayanan commented 7 years ago

About the last part - Ambry already has the support for fetching bytes of a blob from a given range. #556 has been opened to add docs about it to the wiki.

siepkes commented 7 years ago

@pnarayanan I just noticed #587. Seems like a good and well thought out approach!

siepkes commented 7 years ago

@pnarayanan Regarding uniqueness of names within a container namespace; I stumbled across an AWS FAQ ( https://aws.amazon.com/s3/faqs/ ) where they describe the consistency model for S3 buckets (containers) as:

Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.

The FAQ links to a more elaborate document about S3's consistency model: https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel

I think the most intersting bit of info here is:

Amazon S3 achieves high availability by replicating data across multiple servers within Amazon's data centers. If a PUT request is successful, your data is safely stored. However, information about the changes must replicate across Amazon S3, which can take some time, and so you might observe the following behaviors:

  • A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.
  • A process replaces an existing object and immediately attempts to read it. Until the change is fully propagated, Amazon S3 might return the prior data.
  • A process deletes an existing object and immediately attempts to read it. Until the deletion is fully propagated, Amazon S3 might return the deleted data.
  • A process deletes an existing object and immediately lists keys within its bucket. Until the deletion is fully propagated, Amazon S3 might list the deleted object.

I think a lot of people who use S3 don't even realize S3 has these characteristics.

pnarayanan commented 7 years ago

@siepkes Thanks for the links, this is good to know. The consistency model is what I would expect and is exactly what Ambry provides too: read-after-write for PUTs and eventual consistency for DELETEs (deleted data may be returned until the DELETE fully propagates).

siepkes commented 6 years ago

@pnarayanan I had a thought regarding what to do when a user stores multiple blobs with the same user given name (concurrently).

In S3 you can enable keeping track of versions of blobs for a bucket but I suspect behind the scenes versioning is always used by S3. If it is disabled it probably just means older versions of a file get deleted automatically and are not visible for the user and don't count for quota.

You can see on the screenshot in the How Do I See the Versions of an S3 Object? page that the S3 version ID is not a number but just a very long ID, just like the Ambry blob ID.

So that means when implemented the "S3-way" that the user given blob name does not have to be unique. There could be multiple Ambry blob's with that user given name in Ambry's index however if the user wants to retrieve a blob based on the name provided by the user Ambry would just return the blob with the most recent creation timestamp (the most recent version). This sidesteps the problem of what to do when the same user provided blob name gets uploaded concurrently to different nodes.

As an added bonus Ambry could (just like S3) provide the option not to automatically delete older blobs which basically provides versioning for blobs stored in an Ambry container. When versioning is disabled for a container a job could automatically delete old version (maybe during compaction?).

pnarayanan commented 6 years ago

@siepkes This should be of interest to you. It is a proposal created a while ago that talks about similar issues and ideas.

siepkes commented 6 years ago

@pnarayanan I read the document, good read! I think what I outlined above fixes 3 things which the document identifies as issues that still need to be solved:

Key conflicts are not an issue when using the above outlined versioning strategy. Every blob maintains it's Ambry ID as primary ID. The user supplied key is merely an alias to the Ambry ID. If the user specifies the same "unique" key multiple times the creation time (which could be stored in the K/V store) identifies the most recent version (Ambry ID) of the key. Older versions can still be accessed by the user (if versioning would be enabled for the container) or automatically deleted by a scheduled job (if versioning is disabled for the container).

Maintain immutability is something that's fixed automatically with the above outlined versioning strategy; Only new files are added. In the K/V store the Ambry ID of what is regarded as the most recent version of a key is simply changed.

Maintain compatibility is also something that's fixed automatically with the above outlined versioning strategy. Every blob still has an Ambry ID to retrieve it by (which would also be used to retrieve older versions of a blob).

Are my ramblings making any sense? :smile:

siepkes commented 6 years ago

@pnarayanan I'm currently inventorying what would be required to implement this (I might want to take a shot at this).

From what I understand is that you are proposing is to use the K/V store in Ambry's data store implemention. And the main problem being naming conflicts because that K/V store is not strongly consistent. Is that a correct understanding of what you are proposing and my understanding of the problem?

I proposed something that required a strongly consistent K/V store but from what I can gather there is no build-in K/V store in Ambry with those characteristics, I think?

To summerize what I proposed; Leave the generating of primary handles (UUID's) with the Amrby router. On top of that use a (strongly cosnsitent) K/V store with the account ID + Container ID + Unique file name as key and the Ambry primary handle (UUID) as value. This K/V entry would always point to the last created Ambry blob. The K/V entry would be updated as soon as creation (upload) of a blob is completed. This fixes the whole problem of unique names and collisions during upload. During compaction older "versions" of these blobs would be deleted. Since these semantics are the same as those of S3 they should be reasonable (S3 being the defacto-standard of object storage).

vgkholla commented 6 years ago

Sounds like a reasonable solution to me. As an implementation detail, you can use the IdConverter that's already wired-in in AmbryBlobStorageService to achieve the mapping b/w the user desired name and the Ambry UUID.

I think you might also encounter some other challenges on the way to implementing this - one of those that I can think of is that you have to delete blob that was mapped previously to given "file name" which means that you now have a DELETE in the POST path (and you might have to handle failures, reference losses etc?).

But overall, looks doable.

siepkes commented 6 years ago

Small update; I did some thinking about using a K/V store as a naming backend (For example using LinkedIn's Voldemort or Kafka Streams backed by RocksDB). However while blob names could very well be stored in K/V doing a listing of all blobs in a container is going to be hard with K/V. Personally I couldn't come up with a good way to store container listings and maintain changes to made to that list in a K/V store.

So using a K/V store might have been a bit too optimistic. For this to work it requires probably at least a document store (aka "NoSQL"). Something like ElasticSearch.

Recently Google made an interesting post: How Google Cloud Storage offers strongly consistent object listing thanks to Spanner. Spanner is sometimes referred to as "NewSQL" since it combines the capabilities of NoSQL (horizontal scalable) with those of SQL (consistent, ability to use queries). There is an opensource clone of Spanner called CockroachDB (actually created by Ex-Googlers).

one of those that I can think of is that you have to delete blob that was mapped previously to given "file name" which means that you now have a DELETE in the POST path (and you might have to handle failures, reference losses etc?).

Yes I was also thinking about that issue. It probably requires some sort of write ahead log kind of thing to ensure consistency between Ambry and the DB if anything went wrong.

kkimdev commented 6 years ago

S3 API is getting more and more popular and being treated like the choice of database API for cloud providers, for example, the following providers implemented S3 as their only storage product offerings:

Also Minio db https://github.com/minio/minio 's popularity mainly attributes to being an open source alternative to S3 I believe, as it's their main selling point.

I think providing S3 API on the top of Ambry will be greatly helpful for more adoptions.

yhilem commented 5 years ago

I am looking for any information to initiate the implementation of the S3 API with Ambry. I think first at https://github.com/jclouds/jclouds/tree/master/apis/s3/src/main/java/org/jclouds/s3/blobstore but I do not know if it's a good track . Thank you in advance for your feedback.

siepkes commented 5 years ago

@yhilem Getting the Ambry frontend to expose an S3 compatible (HTTP) API is probably the easy part.

The hard(er) part is coming up with the design on how to implement some of the functionality behind this S3 compatible API which Ambry does not currently provide. For example Ambry does not support giving a name to your blob and retrieving your blob by that name. Choosing your own blob names also introduces a new problem; How do you guarantee name uniqueness (ie. prevent collisions) within the bucket? Ambry is loosely consistent, meaning that if you upload something to one node another node will not immediately see that new blob.

yhilem commented 5 years ago

My idea is not to change Ambry. It is simpler I think to:

s3proxy_jcloudsblobstore_cassandra_ambry yhm-2019-02-25

w39hh commented 5 years ago

Our use case is a tiny microcosm of what yhilem is trying to achieve, but yes we tied up Elasticsearch and Ambry. We needed to store more metadata about the objects than seemed appropriate in Ambry, and we wanted the metadata to be searchable. So we put a dead simple REST API in from of both; such that a write, writes the object to Ambry and the metadata to Elasticsearch, a read finds the metadata in Elasticsearch then retrieves the object from Ambry. Writes go to Ambry first, get the Ambry Id, then the Ambry Id becomes an attribute in the Elasticsearch metadata (it could also be used as the Elasticsearch object id). One could replace Elasticsearch with Cassandra, but we are more familiar with Elasticsearch.

zzmao commented 5 years ago

@yhilem Looks like you are trying to use Cassandra to solve the name-ambryID mapping issue. This is a good solution for people who want to migrate/mirror s3 to Ambry. Is this going to be an open source project?

Some thoughts:

  1. Can we provide db query as an interface so that people can have different database implementation?
  2. Can we isolate client and jcloud so that people can create different clients implementation for S3, Azure and GCP?
  3. To support name-ID mapping, An alternatively way is add 'Cassandra' layer between frontend and datanode. This is an idea inspired by Azure Blob Storage.
  4. Frontend in Ambry is a proxy. If you can obtain ambryID from Cassandra, you can directly request data from backend.

BTW, what your tool to draw ambry diagram above? It looks nice.

yhilem commented 5 years ago

The project once finished will be opensourced. For now we have a time constraint: it must be in production this summer.

Some responses:

  1. Yes, with JPA different database implementations: RDBM& NoSQL
  2. Client and JCloud are already isolated. I use s3proxy to provide S3 implementation.
  3. This is a first step because indeed it remains the search functionality.
  4. Ambry supports embedding the routing library directly within client.

Ambry diagram above are from https://engineering.linkedin.com/blog/2016/05/introducing-and-open-sourcing-ambry---linkedins-new-distributed-