neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.37k stars 416 forks source link

test or feat: timeline hash via fullbackup #7715

Open koivunej opened 4 months ago

koivunej commented 4 months ago

At least in tests it would be nice to have a hash for timeline state at lsn.

This is computed in test_ancestor_detach_branched_from and it should be used in test_import.py as well. Path to stable hashable full backup:

More context: https://github.com/neondatabase/neon/pull/7706#discussion_r1597174420

Alternatively the fullbackup and tar_cmp should be refactored and re-used in test_import.py which currently only compares the tar sizes, probably because portable fullbackup comparison seemed too time consuming to implement at the time.

koivunej commented 4 months ago

sorted visitation order for currently HashSet using listings like DbDir

One could argue that it's not easy to hashdos us by using the relation numbers, so these could all be FxHasher based collections. I am unsure if that would give us the determinism in all possible cases.

jcsp commented 4 months ago

Would also like to use this for things like test_sharding_split_compaction, where we should check that absolutely everything is still readable after we drop/rewrite layers.

koivunej commented 4 months ago

For sharding I think the pythonic way will work better, as you can merge the lists of hashes (which will still be unique), as I assume one needs to do with sharding.