timescale / tsbs

Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data
MIT License
1.29k stars 300 forks source link

Add a feature ClickHouse database support #26

Closed sunsingerus closed 5 years ago

sunsingerus commented 6 years ago

ClickHouse can be tested along with all other DBses. ClickHouse implementation is based on TimescaleDB.

codecov-io commented 6 years ago

Codecov Report

Merging #26 into master will decrease coverage by 8.24%. The diff coverage is 15.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #26      +/-   ##
==========================================
- Coverage   61.91%   53.67%   -8.25%     
==========================================
  Files          60       66       +6     
  Lines        2610     3173     +563     
==========================================
+ Hits         1616     1703      +87     
- Misses        984     1458     +474     
- Partials       10       12       +2
Impacted Files Coverage Δ
cmd/tsbs_generate_data/main.go 73.61% <0%> (-0.92%) :arrow_down:
query/clickhouse.go 0% <0%> (ø)
cmd/tsbs_load_clickhouse/creator.go 18.8% <18.8%> (ø)
cmd/tsbs_load_clickhouse/process.go 2.4% <2.4%> (ø)
cmd/tsbs_load_clickhouse/main.go 34.48% <34.48%> (ø)
cmd/tsbs_load_clickhouse/scan.go 78.57% <78.57%> (ø)
...bs_generate_queries/databases/clickhouse/devops.go 9.89% <9.89%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cc569e5...87b78a7. Read the comment docs.

sunsingerus commented 6 years ago

Implementation of this issue

RobAtticus commented 6 years ago

Thanks for this! May be a few days before we get to this, but nice to see!

One note, for the commit msgs what I meant was "Add [a feature]" where [a feature] is replaced with what you are implementing. In this case "Add support for Clickhouse benchmarking"

sunsingerus commented 6 years ago

... a few days before we get to this...

Yes, sure, no problem at all

... "Add [a feature]" where [a feature] is replaced with what you are implementing. In this case "Add support for Clickhouse benchmarking"

Understood, sorry, missed this idea. Hope to be more precise next time

sunsingerus commented 6 years ago

Fixed commit message, looks better now

sunsingerus commented 5 years ago

There are several refactorings that are not related to the Clickhouse code ... some of them seem to change Golang-style short names (1-2 letters even) for longer, descriptive names. ... to revert some of the more subjective changes...

Thanks for the review, and yes, I agree - there are some refactorings not directly related to the ClickHouse functionality. These refactorings originate from the code reversing process - when I've started with implementation, I had very vague understanding of the project, therefore each tool has to be reversed and understood. That's where small code-related refactorings are very handy. As Martin Fowler said: "Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent." So through those small refactorings comes code understanding. As to extracting those small refactoring into separated PRs - yes, of course this can be done, but those refactorings are not the goal by themselves and as Martin says: "You refactor because you want to do something else, and refactoring helps you do that other thing."

As a next step, I'll review all the changes, especially subjective ones and revert those that does not makes code better or are not liked by formatter. I'll notify as soon as update would be ready.

RobAtticus commented 5 years ago

Great, thank you! With regards to refactorings, some of them are definitely improvements -- spelling fixes, adding code comments where they are sparse before, and some of the name changes (e.g., scaleVar -> scale). However so that the review process does not get too unruly or drawn out, I'd like to focus the PR on Clickhouse stuff.

In general I agree with the gist of what you're saying and definitely want to make the code approachable for others to contribute so we can definitely look at those changes in isolation and see which ones make sense. Just want to keep the PRs more narrowly focused.

sunsingerus commented 5 years ago

PR updated, some files are completely excluded, some other are partially reverted back.

sunsingerus commented 5 years ago

Looking into code modifications and it seems to me, that there are some more smaller PRs, that can be extracted from this mega-PR (as we did with random functions). This may be the way to go.

RobAtticus commented 5 years ago

@sunsingerus You might be right. I think reviewing some things piece meal will go more smoothly.

sunsingerus commented 5 years ago

PR updated, one more smaller-PR extracted as a separate PR

sunsingerus commented 5 years ago

Continue to extract separated items into smaller PRs. One more with code cleaning just added

sunsingerus commented 5 years ago

One more smaller PR extract from this mega-PR. Code refactoring and naming clarification PR

sunsingerus commented 5 years ago

Continue to extract small PRs from this mega-PR. shell script PR code refactoring

sunsingerus commented 5 years ago

Can you elaborate on whether that is the case, or if you were mostly trying to re-use TimescaleDB scaffolding here? ... what the benefits or drawbacks of using those are? ... ... Since it seems to support a variety of connection methods (HTTP, JDBC, etc), did you see if there were performance differences?

ClickHouse exposes two API endpoints:

  1. native binary API endpoint
  2. HTTP endpoint

Native binary API is less (should actually be read as poorly) documented and thus very few driver implementations are based on it. Go adapter used is one of those native-API implementations. Native protocol is considered to be more effective and faster than HTTP. HTTP - on the other hand - is simpler, well documented and generally more welcomed by application developers. JDBC driver implementation, for example, is based over HTTP protocol under the hood. So, general idea is that native protocol is expected to be faster, however, there are no any numbers available (at my knowledge). This may be one of the possible future test-cases - compare speed of native vs HTTP protocols on the same tasks.

So my thoughts were like:

  1. Seems to be faster
  2. Close to existing code - easier to adopt.

HTTP protocol can be touched sometime later

sunsingerus commented 5 years ago

Thanks for the explanations of why you chose this.

My pleasure!

I see this code was used in a blog post (I believe, unless some one else did it as well?),

CTO of Altinity Alexander Zaytsev wrote blogpost with detailed comparison of databases available in the test

so presumably the performance was satisfactory.

Each of databases has an area where it shines! Timescale and ClickHouse are in general faster than Influx, however Influx simply surprised with disk usage - it seems not to need any disk space at all!

sunsingerus commented 5 years ago

Thank you for your efforts - really appreciate and enjoying the process! Two small PRs more are ready - Influx shell script and docs update. And that would be all for now, have no more ready updates at the moment

sunsingerus commented 5 years ago

As discussed here doc modifications are merged into this PR. PR updated.

RobAtticus commented 5 years ago

Thanks again, merged!