Open ivan4th opened 5 months ago
Regarding Avoiding code duplication in the tests:
I don't see any reason to have a generic test for two different packages/implementations. Either we should make the package itself generic and have a single test that covers the general case (and is not generic) or we have 2 packages with two separate test implementations.
I don't see an issue in having duplicated code in tests - quite the contrary. I think we should avoid DRY-ing up tests, because each test should stand for itself as much as possible (besides a general setup / teardown approach). Making tests dry quickly makes them hard to change & debug.
duplicated
InMemory
/Open
/Schema
functions in thesql/{statesql,localsql}
don't make sense either, but that's not really the case, as these are supposed to be allowed to drift apart, wrt e.g. the set of migrations and connection pool size, whereas the schema tests verify the fundamental schema properties which should always be the same for bothsql/statesql
andsql/localsql
.
This makes no sense to me - the implementation is the same but is allowed to differ in the future, but although tests are also the same they must not diverge?
Regarding Allowing "some" schema drift:
Some databases may contain artifacts of additional tooling used, such as _litestream* table from Litestream SQLite replication.
I fear this being used by members of the community to bypass the exact thing we are trying to avoid: schema drifts. If for some reason (manually editing the DB, using unreleased builds, etc.) the schema drifts users will "fix" this by just using this configuration to ignore the DBs that differ.
And if the default action upon a schema drift is to only log a warning - which in my opinion it should not be, since a drift can can lead to data corruption - then there is no reason to allow users to get rid of that warning by ignoring tables that differ from the expected schema.
Regarding generating a new schema:
The current suggested workflow is to execute a test that will fail if the schema was updated and produce a file that then has to be renamed and committed. This is a bit clunky to me. We do have make generate
which internally calls go generate
. A //go:generate sh -c "<command that builds schema form migration files and puts it where it belongs>"
would better fit the existing workflow on how to deal with generated files.
It would still serve as a check against unintended schema changes since our CI checks if make generate
modifies files or not.
This makes no sense to me - the implementation is the same but is allowed to differ in the future, but although tests are also the same they must not diverge?
Thing is these are not just tests, but rather essentially the schema tool that helps maintaining schema.sql
in sync with migrations. That must happen in the exactly same way for both localsql
and statesql
.
I'd prefer the separate schema tool route to duplication of that code, so indeed I should probably update the PR to use go:generate
approach.
Re failing by default upon schema drift: I'd welcome @pigmej's opinion on that. As I described in the issue, the scenario I'm afraid of is some stray table in quicksync db that breaks every Smapp installation. With schema drift producing just a warning, it's still useful for diagnostics given that we usually request the application log from the users.
Re not allowing schema drift exceptions: there are a lot of advanced options in go-spacemesh that one can utilize as a footgun ;) This particular option I'd rather keep as it can be useful under various circumstances. In different ops related scenarios the need for such exceptions may arise, and another option (just ignoring the warning b/c some kind of schema drift is expected) may lead to missing unexpected schema drifts. We may add a warning to the docs that mention that option, stating explicitly that one needs to know what (s)he's doing when changing its value.
if the litestream is the issue, then we can get rid of it. We can disable it, and / or clear the schema.
@pigmej litestream is easy to ignore. The problem is a bit more general: is it ok for go-spacemesh
to stop immediately when schema drift is detected, unless this behavior is explicitly disabled in the config, or are our chances of messing this up a bit too high?
Hard to say, if it can be disabled in the config then imo its ok to crash
I would also say return an error when opening a DB and the schema doesn't match (that error should gracefully shut down the node before it even fully started) and allow to disable that behavior in which case it logs with a warning that the schema is different and how it is different, no need to add another config to disable this warning in some cases where a drift is expected / desired in my opinion.
Ok, you convinced me :) So I'll do the following changes to the current implementation:
schema.sql
files, the schema generation from migrations will be done via go:generate
and checked by CI along with other generated filesOne thing that was not fully decided above is where to put the developer docs for schema handling, which I'd prefer to have somewhere (README.md
seems not to be very appropriate place for that). Perhaps should indeed add HACKING.md
for that purpose?
One thing that was not fully decided above is where to put the developer docs for schema handling, which I'd prefer to have somewhere (
README.md
seems not to be very appropriate place for that). Perhaps should indeed addHACKING.md
for that purpose?
Normally I would expect something like this in CONTRIBUTING.md
maybe under Code Guidelines or a similar section. That file however is a bit outdated 🤔
Description
We need a better database schema handling mechanism as there are a few problems with the current one.
No place to look for the schema in the source
Right now, even if the database is a new one, the schema is pieced together by running a number of migrations from
sql/migrations/{local,state}
. The resulting full schema can be observed by runningsqlite3 ... .schema
command on a database, but that's not very convenient, and schema evolution across different go-spacemesh versions can't be easily seen by inspecting history of a schema file.Schema drift
If the database schema is changed in an unexpected way for whatever reason, this may go undetected and cause some hard to debug issues. Suppose a user has run a go-spacemesh version built from
develop
that added a migration, and then switched to a later stable version which also has this migration, but a newer version of it that adds an index.PRAGMA user_version
tells go-spacemesh the database schema is up-to-date, yet actually the index is missing. This leads to a substantial performance degradation, but given that we don't know that the user has used an unreleased version, we might spent a lot of time trying to understand why this particular go-sm instance is THAT slow. Granted, we strongly recommend against using unreleased versions on normal nodes, yet the user might have done that by mistake and/or forgotten about it.Package inconsistency
sql
package used for the state database andsql/localsql
package used for the local database. The database type forlocalsql.Database
builds onsql.Database
while database type for the state database usessql.Database
directly.Inconvenient coded migrations
When a migration has to be implemented as Go code, it cannot be placed in
sql
orsql/localsql
package, and thus cannot be automatically used by{sql,localsql}.{Open,InMemory}
methods which open a database; this is because coded migration usesql
package for accessing the database along with some packages that might importsql
, andOpen
/InMemory
functions using them would cause a cyclic dependency. The places in the code where the databases are opened, such asnode.go
, tests and the merge tool, have to import the migrations and add them via options to theOpen
/InMemory
calls. It would be much better if the code that uses the database could avoid concerns about internal DB schema workings such as particular migrations.Suggested implementation and its problems
The suggested implementation of new database schema handling is here: #6003 Additional description link (README): https://github.com/spacemeshos/go-spacemesh/blob/aada7c61b5e61505404bbd4cfdd7b4deeb7ca45a/README.md#handling-database-schema-changes
The crux of the new approach is this (but still please read the above descriptions before proceeding to the following sections):
sql/statesql
and the local database handled bysql/localsql
package. These files are NOT supposed to be ever edited by handsql/{statesql,localsql}/schema/schema.sql
file and is not supposed to be edited directlyPRAGMA user_version
is old, the required migrations are run.updated
suffix (sql/{statesql,localsql}/schema/schema.sql.updated
) which can be used to update the schema file with a singlemv
commandThere are numerous possible problems with the PR #6003 though which are discussed below. I try to provide rationales for the existing decisions and also describe what's still possibly missing.
Golden test approach
One possible approach to maintaining the
schema.sql
files would be a dedicated tool that could be run as a part of CI pipeline, along with the linter, checking source formatting and generated Go files. That sounds cleaner, but on the other hand, that would require building an extra tool that would have to be invoked separately when dealing with code changes that involve database migrations. In my opinion, that would add some hassle to the development workflow.Because of that, I decided to cut some corners and have a test that verifies the consistency of the schema of each database. Moreover, in case of any differences between the schema that is created by running all the migrations and the schema stored in the
schema.sql
file, the test producesschema.sql.updated
which can be used to "accept" the changes easily (replaceschema.sql
with it). This is essentially the Golden test approach. The difference is that theschema.sql
file is used not just for testing, but also for initialization of fresh databases.I'm not 100% sure the golden test approach involving a Go test is the best. Maybe we should just have the schema tool and
make check-db-schema
andmake update-db-schema
commands that run the tool under the hood, and have the CI domake check-db-schema
.Avoiding code duplication in the tests
Each of the
sql/statesql
andsql/localsql
packages need to have a pair of tests (statesql, localsql) that verify the same thing, but with different schemas:schema.sql
; the same holds for the case when migrations are forced to run on the fresh database instead of usingschema.sql
script.The corresponding tests in
sql/localsql
andsql/statesql
are merely stubs that call the actual test code insql/test
package. This arguably hinders readability and in other cases, it would probably make sense to inline the tests insql/localsql
andsql/statesql
and remove thesql/test
package. But in this case, the tests double as a tool for schema consistency verification due to aforementioned "corner cutting" with schema tests serving as the "schema tool".So: not just tests, but the schema tool. It would be really undesirable for the schema tool implementation to drift apart in
sql/statesql
andsql/localsql
packages. Arguably, it could be said that then mostly duplicatedInMemory
/Open
/Schema
functions in thesql/{statesql,localsql}
don't make sense either, but that's not really the case, as these are supposed to be allowed to drift apart, wrt e.g. the set of migrations and connection pool size, whereas the schema tests verify the fundamental schema properties which should always be the same for bothsql/statesql
andsql/localsql
. Shouldsql/{statesql,localsql}
need tests that verify things that are different between the databases, these can be added to the packages along in addition to the stubs that usesql/test
.Circular dependencies in coded migrations not fully fixed
One of the issues with existing database schema handling mechanism is that the coded migrations placed in
sql
andsql/localsql
packages may cause circular dependencies. With the new mechanism, the situation is somewhat improved, but it is still problematic. All the code that used*sql.Database
package previously is switched to using*statesql.Database
, and the local database code still uses*localsql.Database
. If a migration imports some package that happens to importsql/{statesql,localsql}
, we've still got a circular dependency.One way around it is making
sql.Database
an interface (instead of astruct
) which includessql.Executor
but also provides other methods:Tx
,WithTx
,WithTxImmediate
,QueryCount
,QueryCache
andClose
. Using this interface in all the non-test files instead of*statesql.Database
and*localsql.Database
will resolve the circular dependency as the coded migrations insql/{statesql,localsql}
will be able to import the packages that only usesql
package. The tests in the packages that usesql.Database
interface may still importsql/{statesql,localsql}
packages to use theirInMemory
/Open
/Schema
functions as this does not constitute a circular dependency (importing these packages ignores tests in them).The problem with such an improvement is that it loses the type-based distinction between
*statesql.Database
and*localsql.Database
. This can be mitigated by havingsql.StateDatabase
andsql.LocalDatabase
interfaces that differ in some dummy method, e.g.IsStateDB() bool
vsIsLocalDB() bool
. This sounds somewhat hacky and I'm not sure it's the way to go; on the other hand, the errors caused by mistakenly using*localsql.Database
instead of*statesql.Database
and vice versa are usually very easily caught by tests, so the value of these additional interfaces can be disputed.Allowing "some" schema drift
Some databases may contain artifacts of additional tooling used, such as
_litestream*
table from Litestream SQLite replication. Having a way to avoid it is rather useful for infra maintenance purposes, thus I think havingmain.db-schema-ignore-rx
setting for ignoring certain tables and indices when checking for schema drift does make sense. I think a single regexp is enough as it can include multiple name prefixes with|
.Failing upon schema drift by default
An open question is whether
go-spacemesh
should fail if schema drift is detected, if this behavior needs to be configurable, and if it should fail by default if it is so.What I'm afraid of is that if the schema drift happens due to some table added by accident to a quicksync db or some other unanticipated effect like this, we can get a lot of complaints about Smapp failures because of schema drift, with reasons that may be very confusing to the end-user. On the other hand, if we just log a warning upon schema drift (like it's done currently in #6003) we can still see schema drift warning easily when we look at
go-spacemesh
logs.Programmers' docs for database schema handling
README.md already contains some information on the devenv, such as using different
make
commands, running the tests and setting up the build environment. I've added a section on handling DB schema changes to README.md, but it seems to be somewhat out of place. CONTRIBUTING.md seems to be also not very well suited for such detailed descriptions.Should we have a separate
HACKING.md
or something like that?Printing schema diffs
It is argued that in the schema consistency test, we can rely on plain
require.Equal
for printing schema diff instead of usinggo-cmp
-based schema differ that is used to generate schema drift warnings. While such approach would probably simplify the test a bit, it would also produce the fullschema.sql
script (original and updated) printed as a big unreadable wall of escape text (\n
instead of newlines) before the actual diff. In my opinion, that would reduce the usability of the schema consistency test by cluttering the terminal with too much unusable output and thus the trade-off is not worth it.