nikepan / clickhouse-bulk

Collects many small inserts to ClickHouse and send in big inserts
Apache License 2.0
474 stars 87 forks source link

Bulk inserting is not working #15

Open th0th opened 4 years ago

th0th commented 4 years ago

Hi, first of all, thank your making clickhouse-bulk 💐

I am running with this config

{
  "listen": ":8125",
  "flush_count": 10000,
  "flush_interval": 3000,
  "debug": true,
  "dump_dir": "dumps",
  "clickhouse": {
    "down_timeout": 300,
    "servers": [
      "http://127.0.0.1:8123"
    ]
  }
}

Shouldn't this config collect and insert incoming requests in every 3 seconds, in bulk? I am watching logs and seeing every insert I send via HTTP (I am doing insert by python's requests) being processed immediately as I send. What am I doing wrong?

nikepan commented 4 years ago

it's intresting. I will write test for it. Thanks.

th0th commented 4 years ago

Is there anything I can provide to help you reproduce? I tried both the docker image and binary on GNU/Linux (alpine). I also tried environment variable configuration on docker image. Result didn't change.

nikepan commented 4 years ago

Did you sent insert or select queries? if insert, to same table or different?

th0th commented 4 years ago

Hey @nikepan, my code is like this:

import requests

query = """
    INSERT INTO table
    (field1, field 2)
    VALUES
    ('{field1}', '{field2}');
    """.format(
        field1=field1,
        field2=field2,
    )

requests.post(
    url='http://clickhouse-bulk-url',
    params={
        'query': query,
    },
)

I have many workers on different servers/docker containers doing the same thing, my expectation was clickhouse-bulk to collect multiple incoming inserts in memory and then actually insert to the table in bulks.

nikepan commented 4 years ago

Yes, I catched bug. I dont know, how it works. I will publish new version soon. With all works fine and auto resend dumps. Thank you!

th0th commented 4 years ago

Thanks, looking forward to the new release! Do you have a date in mind?

nikepan commented 4 years ago

It's because you send multi line query. At now clickhouse-bulk supports only single line queries. I try to fix in next release

th0th commented 4 years ago

It's because you send multi line query. At now clickhouse-bulk supports only single line queries. I try to fix in next release

Oh interesting. Would it work if I just replace \n s with space then?

nikepan commented 4 years ago

Yes, i checked it: query = "INSERT INTO table (field1, field 2) VALUES ('{field1}', '{field2}');".format( field1='f1', field2='f2', ) ⇨ http server started on [::]:8124 2019/10/15 21:52:40 send 3 rows to http://127.0.0.1:8123 of query=INSERT+INTO+table+%28field1%2C+field+2%29+VALUES 2019/10/15 21:52:42 Send ERROR 502: No working clickhouse server

th0th commented 4 years ago

Awesome! I can go with that workaround until you release the new version then, thanks a lot @nikepan!

th0th commented 4 years ago

I think I did hit another possible bug. I converted my multiline inserts to single-line ones.

And even I see this log

2019/10/15 21:22:08 send 11 rows to http://clickhouse:8123 of query=...

I still see a single row being inserted into the actual table.

nikepan commented 4 years ago

I need example of your request. Rows calculated by \n char

th0th commented 4 years ago

Here is the code:


import requests

query = "INSERT INTO data (id, date_time, item_id, is_successful, detail, time1, time2, time3, time4, time5, time6, time7) VALUES ('bf58a4af-0e56-4e07-91d9-bbf70917e0da', '2019-10-15 21:36:43', 10, 1, 'OK', 8, 49, 99, 99, 222, 0, 377);"

requests.post(
    url='http://clickhouse-bulk-url',
    params={
        'query': query,
    },
)
th0th commented 4 years ago

A testable example:

import requests

query = "INSERT INTO data (id, date_time, item_id, is_successful, detail, time1, time2, time3, time4, time5, time6, time7) VALUES ('bf58a4af-0e56-4e07-91d9-bbf70917e0da', '2019-10-15 21:36:43', 10, 1, 'OK', 8, 49, 99, 99, 222, 0, 377);"

for i in range(10):
  requests.post(
      url='http://clickhouse-bulk-url',
      params={
          'query': query,
      },
  )

Here is the log:

2019/10/15 21:45:28 send 10 rows to http://clickhouse:8123 of query=...

And I see 1 row inserted into the database.

nikepan commented 4 years ago

you need remove ";" at end of line. It bug too) Thanks for report

nikepan commented 4 years ago

clickhouse-bulk for speed parse strings simply and not support many variations)

th0th commented 4 years ago

No problem :) I will try without the semicolon and check again. Thanks.

a1tus commented 4 years ago

@nikepan, hello. It seems like we've faced this issue too. Can you clarify, is it fixed in v1.2.*? It's released after your comments here but there's no info about it in changelog.

nikepan commented 4 years ago

Can you clarify, is it fixed in v1.2.*? It's released after your comments here but there's no info about it in changelog.

No, it does not fixed. I found some troubles for this changes. I will try fix it later, but not now.

a1tus commented 4 years ago

Ok, thanks, at least now we know status exactly :)

rsurgiewicz commented 4 years ago

Hello Are those issues fixed in latest docker image? I'm trying to use the tool but despite

CLICKHOUSE_FLUSH_INTERVAL=10000

I'm getting send 1 row .... 99% from time-to time send 2 row .... 99% My inserts goes from external system and it's using JSONEachRow to 3 tables. Is this case supposed to work or shall I have a separate instance of bulk per each table?

Thanks!

rsurgiewicz commented 4 years ago

Ok nailed it! There was an extra space before FORMAT keyword in my requests. After removing it, batch inserts work!