neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.6k stars 423 forks source link

replication with clickhouse as a subscriber causes a crash of the compute node #8349

Closed save-buffer closed 2 months ago

save-buffer commented 2 months ago

Steps to reproduce

Follow this instruction, but don't create a role and a database in Postgres, don't modify pg_hba.conf etc, use the neon default user and the default database instead. Crash will happen after the database connection creation in the ClickHouse.

Expected result

No crash

Actual result

Crash

Environment

Staging (ep-holy-fog-w2aoqekg)

Logs, links

Discussion: https://neondb.slack.com/archives/C0756AKBMU0/p1720032475503459

a-masterov commented 2 months ago

To reproduce follow this instruction, but don't create a role and a database in Postgres, don't modify pg_hba.conf etc, use the neon default user and the default database instead. Crash will happen after the database connection creation in the ClickHouse.

knizhnik commented 2 months ago

Were you able to get core dump?

a-masterov commented 2 months ago

Were you able to get core dump?

Unfortunately https://github.com/neondatabase/neon/pull/8272 is not yet merged, still needs an approve from the code owners

a-masterov commented 2 months ago

The backtrace is attached to the slack discussion.

kelvich commented 2 months ago
#1  0x00007f5505fbeecf in NeonWALReadRemote (state=0x5652093cd200, buf=0x5652093cf200 "s", buf@entry=0x5652093cd200 "\023\321\002", startptr=140737453845920, startptr@entry=16777216, count=140737453845916, count@entry=8192, tli=tli@entry=1) at neon_walreader.c:373

so in https://github.com/neondatabase/neon/blob/main/pgxn/neon/neon_walreader.c#L373

save-buffer commented 2 months ago

With https://github.com/neondatabase/neon/pull/8360 merged, we should try again and confirm it doesn't crash with clickhouse

a-masterov commented 2 months ago

I confirm that starting a logical replication using ClickHouse doesn't cause crashes anymore.