utsaslab / pebblesdb

The PebblesDB write-optimized key-value store (SOSP 17)
BSD 3-Clause "New" or "Revised" License
504 stars 98 forks source link

Do a trivial rename to pebblesdb. #1

Closed rescrv closed 6 years ago

rescrv commented 6 years ago

If you're going to build on my work to get an SOSP paper, fucking own that shit. Don't half ass the job.

I'm still bitter at reviewers for rejecting me over more trivial details in the past, so please care enough to pay attention to the details that made HyperLevelDB, "smaller, better documented ..., and easier to understand," to pay it forward for future researchers.

BTW, your tests don't compile.

vijay03 commented 6 years ago

Hi Robert,

That sounds bitter (and a bit rude?). I empathize with the sting of getting rejected, and getting bad reviews. That doesn't mean you should leave caustic comments on the code of people building on your work.

Thanks for bringing this up! We'll be careful with naming and other small details.

The code requires libsnappy-dev and libtools to compile (we'll document this shortly). I just confirmed the tests compile on a fresh Ubuntu 16.04 VM once the dependencies are installed. The code does not compile on Mac OSX. What is your setup?

rescrv commented 6 years ago

Hi Vijay,

It is bitter and may come across as rude. In my experience in grad school, I had some papers I invested plenty of time into rejected because people played unethical games. Other papers built on my codebase with minimal changes and presented incomplete versions of complete ideas from the rejected paper. It wasn't just the sting of a standard reject.

I ceased work on HyperLevelDB in 2014 in the middle of my divorce because I didn't have the effort to sustain it. I never had server-grade SSDs to test on, and I spent my own money buying consumer grade SSDs to test with that I burnt into the ground and rendered unusable. I spent countless hours working on the code that's out there. Inside HyperLevelDB I implicitly over-fragment SSTables in order to reduce overlap. The state in which I left the project was midway through the effort to finish that and not finishing it has always bugged me.

When I saw that PebblesDB built on HyperLevelDB and was designed to address write amplification, I was excited and wanted to look into the code and read the paper. I never thought of using randomization to partition the SSTables and it seems so obvious with hindsight (I consider that the hallmark of a great idea). What HyperLevelDB was tending towards was much more complex and error prone. It is exciting to know someone builds on what you've poured literal months of your life into.

That entire history is gone from PebblesDB whose history has been rewritten to start on September 7, 2017 and include nothing of the work that came before it, which includes the work that came before me. It's as if it never happened now that there's something new that beats it in benchmarks.

I'm using Linux with g++ 7.1 and whatever stack sits underneath it (7.1 is the only relevant piece as glibc should be unchanged) and see the error:

make[1]: 'arena_test' is up to date.
make[1]: 'bloom_test' is up to date.
make[1]: 'c_test' is up to date.
make[1]: 'cache_test' is up to date.
make[1]: 'coding_test' is up to date.
  CXX      db/corruption_test.o
db/corruption_test.cc:31:18: error: cannot declare field ‘leveldb::CorruptionTest::env_’ to be of abstract type ‘leveldb::test::ErrorEnv’
   test::ErrorEnv env_;
                  ^~~~
In file included from db/corruption_test.cc:23:0:
./util/testutil.h:30:7: note:   because the following virtual functions are pure within ‘leveldb::test::ErrorEnv’:
 class ErrorEnv : public EnvWrapper {
       ^~~~~~~~
In file included from db/corruption_test.cc:14:0:
/home/rescrv/tmp/pebblesdb/include/pebblesdb/env.h:168:21: note:    virtual pthread_t leveldb::Env::GetThreadId()
   virtual pthread_t GetThreadId() = 0;
                     ^~~~~~~~~~~
make[1]: *** [Makefile:1138: db/corruption_test.o] Error 1
make: *** [Makefile:1531: check-am] Error 2
vijay03 commented 6 years ago

Hi Robert,

I'm sorry to hear about that -- that does seem like more the standard grad-school misery. For what it is worth, we are very glad you worked on HyperLevelDB and HyperDex! The LSM key-value store space was definitely made better due to your contributions.

You don't need to worry that HyperLevelDB is getting overlooked. As you would have noticed in both the paper and this repository, we mention everywhere that we build on HyperLevelDB and link to both the Hyperdex paper and the HyperLevelDB repository. One of the happy side-effects of the paper is that people can see how HyperLevelDB itself performs better than RocksDB and LevelDB in many cases :)

I agree that it is a shame our repo is not linked to the HyperLevelDB one. If there is a way to link our github repo to the HyperLevelDB repo, we would be glad to do so. Our route was basically clone HyperLevelDB to a private repo -> work on it -> clean it up -> post on Github. We would like it to reflect that we cloned from HyperLevelDB, but we don't wan't our messy intermediate updates on Github.

@pandian4github, can you check the compilation with g++ 7.1?

webmaven commented 6 years ago

@vijay03, GitHub now allows squashing commits when merging, which would hide the intermediate commits:

https://github.com/blog/2141-squash-your-commits

So something like this would work:

  1. Fork HyperLevelDB to create a pebblesdb repo

  2. Make a branch in the new repo

  3. Clone the branch

  4. Apply the messy changes to the clone

  5. Push the branch, and then

  6. Squash-and-merge the branch to master, and

  7. Optionally delete the branch.

There are other variations possible, such as using your private repo or another new repo instead of a branch as the messy intermediate step, and squash-merging a Pull Request to the forked repo. The key points are to start the pebblesdb repo as a fork, and to end with a squashed merge commit that incorporates your changes while hiding the mess.

It may be possible to accomplish the same thing retroactively with the extant pebblesdb repo instead of making a new fork, but I am not sure how.

agioi commented 4 years ago

@vijay03, GitHub now allows squashing commits when merging, which would hide the intermediate commits:

https://github.com/blog/2141-squash-your-commits

So something like this would work:

1. Fork HyperLevelDB to create a pebblesdb repo

2. Make a branch in the new repo

3. Clone the branch

4. Apply the messy changes to the clone

5. Push the branch, and then

6. _Squash-and-merge_ the branch to master, and

7. Optionally delete the branch.

There are other variations possible, such as using your private repo or another new repo instead of a branch as the messy intermediate step, and squash-merging a Pull Request to the forked repo. The key points are to start the pebblesdb repo as a fork, and to end with a squashed merge commit that incorporates your changes while hiding the mess.

It may be possible to accomplish the same thing retroactively with the extant pebblesdb repo instead of making a new fork, but I am not sure how.

Apparently Vijay doesn't want to do so. It is easy to do the following

  1. Fork HyperLevelDB and clone it to local
  2. Copy all PebblesDB source code in to the local clone
  3. Do a gigantic commit and push