kaaveland commented 6 months ago

The primary issues with this module right now are:

We're "always just adding things" to struct TxLockTracer
Most of the logic of diffing the database goes in TxLockTracer::trace_sql_statement
There's a lot of repetitive code there.
Due to the size of the struct and the content it produces, it is becoming annoying to write tests.

The nice thing is:

It achieves the "stupid" part of KISS
As far as I can tell, it has been stupid enough that adding new things to it has been largely error-free and simple.

It may be that just extracting methods from the lock tracer is a good place to start, then looking for a common abstraction later.

kaaveland commented 6 months ago

After #77 it's glaringly obvious that we should find a way to not destroy all collected information in between transactions/scripts. Each trace is taking progressively longer when the database schema grows, since we build up all the information each time. So that's another requirement.

kaaveland commented 6 months ago

Doing some thinking for what we want to do. For anyT in Relation, Constraint, Column, Lock.

There exists some initial state in the database, and it is a set of T. The database has an ID for each T (oid, conid, oid, attid, oid, mode).

For each statement we trace, we want to find any new T. We want to find any T that has changed, these are Modified<T> { old: T, new: T }. We want to find any T that used to be there, but isn't anymore. Then, we want to update our view of the set of T. For each transaction we start, we want to get an initial sequence of T. Checks would be interested in asking questions like are there any T that have changed? Are there any T such that $question? Was the T visible to other transactions at $time?

This feels much better than what we have now and also makes it obvious how to carry with us the trace state from one transaction to the next.

To do the diff, we need to retrieve some set of T from the database. I think we'll need to break this down into several queries to avoid sending "all of the state" over the wire. Or maybe back the set of T with a temporary table. I think that's probably the fastest way? I mean, to discover altered columns, you have to send all the columns over the wire, or maintain the previous state in the database server. 🤔

kaaveland commented 6 months ago

Did some profiling, and the tracer spends almost all of the time waiting for postgres to return values (and postgres is running at 100% of one CPU core). So currently it looks like it might be smart to try to offload some more work on to eugene, if we can do it.

kaaveland commented 6 months ago

83 shows that we need a way to check, in a rule, if one or more columns are covered by a total index, suggesting that we need to build a complete model of the database structure/schema in memory. 🤔

kaaveland / eugene

Find and implement a good abstraction for the tracer #76

83 shows that we need a way to check, in a rule, if one or more columns are covered by a total index, suggesting that we need to build a complete model of the database structure/schema in memory. 🤔