stephan-hof / pyrocksdb

Python bindings for RocksDB
BSD 3-Clause "New" or "Revised" License
150 stars 170 forks source link

rocksdb and network filesystem #9

Closed dineshbvadhia closed 10 years ago

dineshbvadhia commented 10 years ago

Is this the best place to ask general questions?

I know rocksdb is not a distributed db, but is it designed to work with a network filesystem ie. machine runs rocksdb but db is on filesystem on network?

stephan-hof commented 10 years ago

Hi,

first of all I think the official rocksdb github page (https://github.com/facebook/rocksdb/issues) or facebook group (https://www.facebook.com/groups/rocksdb.dev/) is a far more better place to ask this

Regarding your question. It seems to me that rocksdb was designed especially for ssd, but I also have good experience on hdd's. As far as I know they don't have special requirements on the filesystem. They try to use fallocate to create the db files, but fall back to posix_fallocate in case the calls is not present. So in general I would say "It works".

On performance I can only guess => I would recommend to drive a benchmark on your own. If you compile rocksdb, there is already a tool available called './db_bench'. With this tool you can do some benchmarks on ssd/hdd/network filesystem to see how it behaves. You can have a look here https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks how they use the tool. However don't compare your results with the absolute numbers on that page. They used very high end hardware for these tests (FusionIO devices). I was referencing this page only to show you how this tool works.

dineshbvadhia commented 10 years ago

Hi Stephan

Thanks for getting back. I asked the same question independently on the facebook page and got the same answer as yours.

I'm doing tests with large data sets on a cluster and unfortunately there is only a (very fast) network attached filesystem available. Production system would have directly attached storage.

I use python and so will start using pyrocksdb in the next day or so. The initial requirements are very basic - first, write lots of data to populate the db and then it is mainly read.

Best ... Dinesh

From: stephan-hof Sent: Saturday, May 31, 2014 12:29 AM To: stephan-hof/pyrocksdb Cc: dineshbvadhia Subject: Re: [pyrocksdb] rocksdb and network filesystem (#9)

Hi,

first of all I think the official rocksdb github page (https://github.com/facebook/rocksdb/issues) or facebook group (https://www.facebook.com/groups/rocksdb.dev/) is a far more better place to ask this

Regarding your question. It seems to me that rocksdb was designed especially for ssd, but I also have good experience on hdd's. As far as I know they don't have special requirements on the filesystem. They try to use fallocate to create the db files, but fall back to posix_fallocate in case the calls is not present. So in general I would say "It works".

On performance I can only guess => I would recommend to drive a benchmark on your own. If you compile rocksdb, there is already a tool available called './db_bench'. With this tool you can do some benchmarks on ssd/hdd/network filesystem to see how it behaves. You can have a look here https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks how they use the tool. However don't compare your results with the absolute numbers on that page. They used very high end hardware for these tests (FusionIO devices). I was referencing this page only to show you how this tool works.

— Reply to this email directly or view it on GitHub.

stephan-hof commented 10 years ago

Hi,

if you have the following workload

you may want to look at http://pyrocksdb.readthedocs.org/en/v0.2.1/api/database.html#rocksdb.DB.compact_range.

It is just an idea inspired by this paragraph https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks#test-1-bulk-load-of-keys-in-random-order I never used it myself on production data, but you may get a speedup on insert/reads.

stephan-hof commented 10 years ago

I think the question is answered => closing the ticket.