Closed alexandrujieanu closed 1 year ago
Are you using any proxies / load balancers in front of ClickHouse?
Also please show:
select * from system.replicas format Vertical;
show tables;
Also, it looks like you are have replicated: false
and cluster: uptrace1
is missing completely. Is that a mistake?
Uptrace connects to Clickhouse via the kubernetes service:
addr: clickhouse-headless.clickhouse.svc:9000
A request from pod/uptrace-0
could go to any of clickhouse-shard0-{0,1,2}
, but at Clickhouse level the data should be replicated so I don't see a problem here. Do you?
I am only able to start Uptrace with replicated: false
and without cluster:
. Are they mandatory for such a setup?
SELECT *
FROM system.clusters
LIMIT 3
FORMAT Vertical
Query id: 5e099aa3-2730-408b-9137-511d98ab503f
Row 1:
──────
cluster: default
shard_num: 1
shard_weight: 1
replica_num: 1
host_name: clickhouse-shard0-0.clickhouse-headless.clickhouse.svc.cluster.local
host_address: 172.23.4.71
port: 9000
is_local: 1
user: default
default_database:
errors_count: 0
slowdowns_count: 0
estimated_recovery_time: 0
Row 2:
──────
cluster: default
shard_num: 1
shard_weight: 1
replica_num: 2
host_name: clickhouse-shard0-1.clickhouse-headless.clickhouse.svc.cluster.local
host_address: 172.23.0.167
port: 9000
is_local: 0
user: default
default_database:
errors_count: 0
slowdowns_count: 0
estimated_recovery_time: 0
Row 3:
──────
cluster: default
shard_num: 1
shard_weight: 1
replica_num: 3
host_name: clickhouse-shard0-2.clickhouse-headless.clickhouse.svc.cluster.local
host_address: 172.23.2.133
port: 9000
is_local: 0
user: default
default_database:
errors_count: 0
slowdowns_count: 0
estimated_recovery_time: 0
3 rows in set. Elapsed: 0.002 sec.
SELECT *
FROM system.replicas
FORMAT Vertical
Query id: 8f3d17a6-a7f1-46cf-9d83-158f15b8207c
Ok.
0 rows in set. Elapsed: 0.004 sec.
SHOW TABLES
Query id: 9083038f-c60d-4523-b2d3-f9bcea6e3fc2
Ok.
0 rows in set. Elapsed: 0.002 sec.
but at Clickhouse level the data should be replicated so I don't see a problem here. Do you?
It will be replicated if you use replicated: true
. Then you will see the replicated tables in system.replicas
view.
I am only able to start Uptrace with replicated: false and without cluster:. Are they mandatory for such a setup?
Yes.
Okay, then I'm focusing to get Uptrace working with those two flags.
I believe you can close this issue or #12.
Hello,
replicated: true
and cluster: default
seem to have fixed the issue, combined with creating the Clickhouse database before Uptrace starts. I am using the bitnami clickhouse package which doesn't accept CLICKHOUSE_DB but I did an initialization script.
I have a few remarks I want to share:
error tracing/span_processor.go:231 ch.Insert failed {"error": "DB::Exception: Table uptrace.spans_index_buffer_dist doesn't exist", "table": "spans_index"}
After a lot of trial an error I ended up with:
GRANT ALL PRIVILEGES ON ${uptrace_database}.* TO '${uptrace_username}'
GRANT CLUSTER, REMOTE, SOURCES ON *.* TO '${uptrace_username}'
Thanks for the notes! I definitely will spend some time reflecting on them and making changes.
For some reason Uptrace itself creates the database when these two settings are not used and doesn't when they are.
Will fix.
then the pod crashes
Do you remember why? Is it the Uptrace pod or the ClickHouse pod?
It would be nice to have them tested and documented.
:+1:
When I run helm, the pods are coming up but as migrations fail, uptrace exists with error, container crashes and kube-scheduler restarts the pods. The pods are coming back up but this time migrations are not run anymore, uptrace runs but with errors like "Insert failed".
@vmihailenco I suspect Uptrace is handling the replication of its database within Clickhouse. Is this correct?
Context:
I have created a dedicated database and database user and the default
Clickhouse admin user is complaining that it can't access the tables created by the uptrace user. I can see this in logs:
2023.05.19 07:14:18.572003 [ 507 ] {}
uptrace.measure_minutes_buffer_dist.DirectoryMonitor.default: Code: 516. DB::Exception: Received from clickhouse-shard0-1.clickhouse-headless.clickhouse.svc.cluster.local:9000. DB::Exception: default: Authentication failed: password is incorrect, or there is no user with such name.
I have read that distributed tables are replicated by the default
Clickhouse admin user and in my case it didn't have access to do so.
I have granted the access and now I get the impression that both default
and uptrace
users are trying to do the replication.
2023.05.19 09:36:07.274130 [ 354 ] {}
uptrace.spans_index (1511c250-3707-45b8-8bd2-38e437f731ca) (Replicated OutputStream): Block with ID 20230519_3745753001925908684_4954532985886563788 already exists on other replicas as part 20230519_185_185_0; will write it locally with that name. 2023.05.19 09:36:07.274408 [ 354 ] {} uptrace.spans_index (1511c250-3707-45b8-8bd2-38e437f731ca) (Replicated OutputStream): Part 20230519_185_185_0 is duplicate and it is already written by concurrent request or fetched; ignoring it. 2023.05.19 09:36:14.668583 [ 354 ] {} uptrace.spans_data (bebddfa6-e6d8-4dd3-bb36-70a3fc1a9a9d) (Replicated OutputStream): Block with ID 20230519_6567306983901795181_13630949410134203585 already exists locally as part 20230519_184_184_0; ignoring it. 2023.05.19 09:36:14.799234 [ 350 ] {} uptrace.spans_index (1511c250-3707-45b8-8bd2-38e437f731ca) (Replicated OutputStream): Block with ID 20230519_2645648279967518409_18311755029048189739 already exists on other replicas as part 20230519_186_186_0; will write it locally with that name. 2023.05.19 09:36:14.807786 [ 357 ] {} uptrace.measure_minutes (aa12e815-849d-40bb-b7b1-6fb25f754657) (Replicated OutputStream): Block with ID 20230519_2487656886605356058_6987131945011303779 already exists on other replicas as part 20230519_951_951_0; will write it locally with that name. 2023.05.19 09:36:14.849173 [ 552 ] {} uptrace.spans_index (1511c250-3707-45b8-8bd2-38e437f731ca): auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Code: 235. DB::Exception: Part 20230519_186_186_0 (state Active) already exists. (DUPLICATE_DATA_PART), Stack trace (when copying this message, always include the lines below)
In this case, I am considering if I should let Uptrace use the default
user (which initially I didn't want to) or what other options do I have.
Thanks.
No ideas here. So far I've only used ClickHouse users to limit queries complexity, not to restrict access...
FYI, Clickhouse developers say it's a client misbehaviour.
Error Code 235 DB::Exception DUPLICATE_DATA_PART
I am not sure. I haven't seen this while using the default
user.
I am not sure either. There are no indications that it was a go-clickhouse client so I am inclined to agree with "There is nothing to fix" sentiment :)
@alexandrujieanu, I'm having the same problem as you. I talked a little more about the problem on Telegram (https://t.me/uptrace/1419). Did you get a solution?
I've pushed v1.5.5 that presumably fixes this.
Hello,
Do you have any idea why this is happening?
Screencast from 2023-05-10 14-04-58.webm
I'm accessing the service and landing on different pods.
Uptrace
values.yaml
:Clickhouse:
Thank you.