Up-to-date benchmarking tests

StarpTech commented 8 years ago

The tests on the github page are outdated. The last benchmark was in 2013. Is anything being done to update the tests? Thanks.

(2012) XGDBench: A Benchmarking Platform for Graph Stores in Exascale Clouds

What is the state of this ticket? https://github.com/orientechnologies/orientdb/issues/3944

I think it is essential to do such tests at certain intervals so that customers (like me) can build his own opinion about qualitity and stability of an opensource driven project. Feature tables or

http://orientdb.com/orientdb-vs-neo4j/ http://orientdb.com/orientdb-vs-mongodb/

telling nothing about how good a product is.

StarpTech commented 8 years ago

Here a very good example https://www.arangodb.com/performance/

luigidellaquila commented 8 years ago

Hi @StarpTech

The problem with vendor benchmarks is that it's extremely easy to design a benchmark that performs extremely well on a particular db for specific use cases, work loads, APIs and so on.

This is exactly what ArangoDB did with these benchmarks. To give you only two examples:

they use unrealistic use cases to exploit their sweet spots: see "neighbors" benchmark, they are retrieve only neighbor keys, not any other information. This is extremely convenient for them, who store edge relationships in indexes: the index just contains the reference key, so they don't even have to read referenced records. On the other hand, any application that does graph analysis also needs other additional information, so the use case does not make much sense in the real world. If you slightly change the use case to also retrieve a single property of the neighbor, you will see that the performance of all the competitors (not only OrientDB) is much better than Arango.
they choose a test framework and APIs that better fit their platform: they choose Node.js because they are optimized on that. Try to execute the same tests on a Java stack and you will have different results. Even better, try to run the same benchmarks with OrientDB embedded in a Java application (notice that ArangoDB is written in C, it cannot be embedded) and you'll see a difference of 20-50x on many of their tests.

There is one positive point on all this though, that is the clarity of their docs. I agree with you that we can invest some time to provide something similar. We started to work on Jepsen tests some time ago, a guy from the community provided a first very partial implementation, but then he gave up for lack of time, now one of our guys is working to complete it. The problem there is that Jepsen is written in Clojure and nobody in our team is very proficient with that language. I hope I'll be able to give you some news in next weeks

As a conclusion, I suggest you to take this kind of benchmarks for what they are: advertisements.

Thanks

Luigi

StarpTech commented 8 years ago

@luigidellaquila I get your point but the neighbors benchmark is just one of the many benchmark test what they did. They also benchmark Single read, Single write, aggregation, memory usage. This is really db independent and can be tested without any effort. My problem is that orientdb lacks in information of these simple areas. I like orientdb and I want to help that orientdb will be better but It seems that orientdb has no marketing division which would require such information. The fact makes Orientdb look old-school and unprofessional and this comes from a customer.

StarpTech commented 8 years ago

@luigidellaquila I could find a benchmark from 9 July 2015 http://orientdb.com/orientdb-performance-challenge/ and a test repository https://github.com/orientechnologies/nosql-tests I don't know you ? ^^

StarpTech commented 8 years ago

The repo is a fork of https://github.com/weinberger/nosql-tests and the related blog post doesn't contains orientdb as a vendor.

luigidellaquila commented 8 years ago

Hi @StarpTech

Yes, I know them of course, that was our answer to the benchmarks you posted before.

That's another proof of the fact that with benchmarks you can demonstrate whatever you want, we do not publish them as first class public benchmarks just because we don't think that these use cases are very significant (although the results are very good for us).

Don't get me wrong, we take into strong consideration your notes about communication and marketing, OrientDB was born as a community project, always focused on the technical side, but now the company is starting to invest important resources on these aspects

Thanks

Luigi

lvca commented 8 years ago

Please look also at my answer on this: https://groups.google.com/d/msg/orient-database/5vrEB8Ycw5U/zdv1rMKlAQAJ

smolinari commented 8 years ago

Hey Luca. That is all fine and dandy, but let's put Arango DB's marketing efforts aside.

Some people say performance is a feature. I disagree, when performance is an important part of the measure of quality, and that certainly is the case with databases. So, any database vendor should be benchmarking in general, to see where they stand against other databases. The performance tests also help support the business in the end, both from a marketing perspective (if the results are good) and a support perspective.

I also realize doing benchmarks takes a lot of time and effort. Still, as a minimum, ODB should be benchmarking against itself (newer against older versions), as part of the QA process.

I mentioned the performance testing is good for support. Actually, it is very important from a support perspective, because a baseline - a known performance standard - helps support, when customers come at them with, "I upgraded to version X, and now things are much slower". With the performance data backing up the fact it isn't the DB causing the slowdown, support can go after the real issues faster. Support knows the newer version definitely performs better, so there is no questioning the software quality.

A performance standard also helps push the selling of support btw. Imagine this answer from support. "Hey, we know it isn't the software, but please take our offer to optimize your database for you. We have fair support pricing and I am sure you'll be pleased with the results." Imagine trying to sell that support, without the performance standard set. Imagine what could and probably will happen. "Hmm...sorry, we did our best. But, it was actually the database after all. But, since you asked us to work on it, can you still pay for our service, even though what we did was our own fault?" That just isn't going to fly well at all and most likely, Orient Technologies will end up bearing the support costs, instead of earning money. That is the most costly way to improve the software.

In other words, benchmarking/ performance testing is a good investment, if only for QA purposes. :smile:

Scott

lvca commented 8 years ago

Hi Scott, I agree with you on benchmarking OrientDB, specially against past releases. This is the reason why we created https://github.com/orientechnologies/orientdb-benchmarks. It will run in our CI environment at every commit and once tests are finished it populates the result in a OrientDB database so we can monitor performance boost/slow-down at every commit.

StarpTech commented 8 years ago

Hi @lvca lots of people use the binary driver or rest interface are you planning to cover this?

StarpTech commented 8 years ago

Hi @lvca the current benchmark suite https://github.com/orientechnologies/orientdb-benchmarks handle a dataset with 53KB this is no expressive dataset. The suite just tests basic operations and seems it is not maintained. There is also no test about a distribution setup. Thanks.

StarpTech commented 7 years ago

Any updates?

orientechnologies / orientdb

Up-to-date benchmarking tests #5483