noctarius / timescaledb-event-streamer

timescaledb-event-streamer is a command line program to create a stream of CDC (Change Data Capture) TimescaleDB Hypertable events from a PostgreSQL installation running the TimescaleDB extension.
Apache License 2.0
39 stars 3 forks source link

Fix: Exclude dropped chunks from the list of all chunks in the catalog #207

Open vidosits opened 6 months ago

vidosits commented 6 months ago

Fixes #206

Unit tests have to be written, but internal/systemcatalog/systemcatalog.go is missing its tests altogether and I'm kinda lost where to start so as to not to just test basic stuff, but @noctarius the gentleman he is has offered to add unit tests

vidosits commented 6 months ago

hey, @noctarius could you check if this is the kind of test you were thinking of?

noctarius commented 6 months ago

Looks like a real good initial test. Have you made sure the test fails without your fix? I'm not sure that this case fixes your specific issue. Happy to test it though :-)

noctarius commented 6 months ago

It looks like it's working without the additional check. I wonder when dropped is set to true, over the chunk being dropped immediately 🤔

Let me investigate the Timescale sourcecode. What version do you use? Maybe it's a recent change?!

vidosits commented 6 months ago

you're right of course, this test is a bait in it's current form, it passes without the committed "fix" either way. I think the reason is a combination of the following:

The test itself does not touch this code path. So even if the test failed then the fix wouldn't fix it. The reason is because we're not collecting chunks using func (sc *systemCatalog) GetAllChunks(), but rather with an SQL query on the timescale internal schema.

During this test we're not actually testing the internals, we're just basically testing if we successfully dropped the chunks. It's a bad test.

The whole reason I needed this fix is because of the following scenario:

Also the test is wrong because of the above: when I'm dropping chunks with SELECT drop_chunks() in the test they all get dropped and I think there's no corresponding row left in the mock db indicating that the chunk was there but with dropped = true column set

To summarize:

vidosits commented 6 months ago

It looks like it's working without the additional check. I wonder when dropped is set to true, over the chunk being dropped immediately 🤔

Let me investigate the Timescale sourcecode. What version do you use? Maybe it's a recent change?!

We're using timescale/timescaledb-ha:pg15-ts2.10 image with the v0.33.1 timescaledb-single helm chart from charts.timescale.com

noctarius commented 6 months ago

Great! I won't have a chance to look at this again until the end of the weekend since we're on the road, but thanks for your effort and help :-)

vidosits commented 6 months ago

No problem, thanks for taking the time and looking at my silly problem

noctarius commented 6 months ago

Not silly at all. It's a valid issue and should be fixed. I guess, I'll probably fall back, forcing the dropped field to true 😅

It may be possible, that it is set first, in it's own transaction, and then the actual drop happens, to prevent the data being "available" for large chunks when actually dropping may take a second or two.