housepower / ClickHouse-Native-JDBC

ClickHouse Native Protocol JDBC implementation
https://housepower.github.io/ClickHouse-Native-JDBC/
Apache License 2.0
527 stars 145 forks source link

Implement basic logic of re-using buffers #382

Closed akamensky closed 2 years ago

akamensky commented 2 years ago

Closes #353

codecov[bot] commented 2 years ago

Codecov Report

Merging #382 (6fe1401) into master (2b8d60f) will increase coverage by 0.52%. The diff coverage is 92.59%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #382      +/-   ##
============================================
+ Coverage     61.84%   62.37%   +0.52%     
- Complexity     1191     1248      +57     
============================================
  Files           131      135       +4     
  Lines          6401     6631     +230     
  Branches        475      516      +41     
============================================
+ Hits           3959     4136     +177     
- Misses         2175     2219      +44     
- Partials        267      276       +9     
Impacted Files Coverage Δ
...c/statement/ClickHousePreparedInsertStatement.java 52.90% <ø> (+3.55%) :arrow_up:
.../github/housepower/settings/ClickHouseDefines.java 0.00% <ø> (-75.00%) :arrow_down:
...hub/housepower/data/ColumnWriterBufferFactory.java 70.58% <70.58%> (ø)
...rc/main/java/com/github/housepower/data/Block.java 91.17% <91.66%> (-0.06%) :arrow_down:
.../com/github/housepower/buffer/ByteArrayWriter.java 100.00% <100.00%> (+29.62%) :arrow_up:
...com/github/housepower/data/ColumnWriterBuffer.java 100.00% <100.00%> (ø)
.../com/github/housepower/serde/BinarySerializer.java 94.73% <100.00%> (+0.53%) :arrow_up:
...m/github/housepower/settings/ClickHouseConfig.java 77.12% <0.00%> (-0.19%) :arrow_down:
...om/github/housepower/data/type/DataTypeDate32.java 45.83% <0.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2b8d60f...6fe1401. Read the comment docs.

AndreyNudko commented 2 years ago

First of all, thanks a lot for addressing this over-allocation problem. I was contemplating a simpler change, but this one is a lot better! I have couple of questions if that's ok:

  1. Is it necessary for SOCKET_SEND_BUFFER_BYTES, SOCKET_RECV_BUFFER_BYTES, and COLUMN_BUFFER_BYTES to be final? We're currently setting lower values to minimize allocation I appreciate this PR addresses the problem in a better way, but they seem like useful tunables (unless ClickHouseDefines is considered internal API)

  2. If there was a spike in writes and ByteArrayWriter allocated extra buffers - they will remain in freeList forever and won't be collected, right? It may be desirable in case of long-term change in traffic pattern, or wasting memory in case of a short burst Not saying this is a problem (almost certainly not for our use-cases), just trying to understand implications

  3. ColumnWriterBuffer::writeTo Is there any reason to use temporary buffer instead of doing something like serializer.writeBytes(buffer.array(), buffer.arrayOffset(), buffer.remaining()); I understand this is coupled with ByteArrayWriter implementation allocating heap buffers - but both classes already seem to have quite close relationship; and the clue is in the name

akamensky commented 2 years ago

@AndreyNudko Please see comment in linked ticket. I am not the right person to ask those questions and not sure the right person will have more time to work on this to address those. I am but a messenger.

AndreyNudko commented 2 years ago

@akamensky Don't get me wrong - those questions are more for maintainers to consider, I'm certainly not asking for more from you guys. And I really appreciate the effort you made.

sundy-li commented 2 years ago

@akamensky Thanks for the pr, I'll take time to review this.

@AndreyNudko

Is it necessary for SOCKET_SEND_BUFFER_BYTES, SOCKET_RECV_BUFFER_BYTES, and COLUMN_BUFFER_BYTES to be final?

I do think letting it be the way before is better, users can modify set these to other values.

If there was a spike in writes and ByteArrayWriter allocated extra buffers - they will remain in freeList forever and won't be collected, right?

Yes, that's the side effect, manually re-using the buffer or let them be controlled by JVM. But I think it may be worthy to do if we can set fixed blocksize (batchsize) for each insert( better let the value lesser than 1048576).

clickhouse-go seems to have a buffer block in each connection, see: https://github.com/ClickHouse/clickhouse-go/blob/master/clickhouse.go#L46-L61

akamensky commented 2 years ago

@sundy-li as I mentioned, please check the comment in linked ticket. I did not write those changes, it comes from a colleague who is a Java expert. They do not have more time to invest into this changeset. They also mentioned that the implementation is very crude, so you may be right on those comments you left. I am not the right person to follow up on those comments, but feel free to modify these changes as you see appropriate.

Meantime we are using those changes internally and it reduced CPU load dramatically.