Rework Catalog and TableStatisticCollection

The problem

for any writes to the catalog, we need to maintain both read and write version of the whole catalog, basically duplicate the memory overhead unnecessarily.
checkpoint of the catalog file triggers rewritten of the whole file, which is also unnecessary in almost all cases.
the two version design also exists in TablesStatistics. while they basically duplicate the same logic without sharing the same architecture.
there is lack of built-in dependency management in our current catalog. RelGroup is also modelled as a Table, which is not the correct level of abstraction, as it should be the parent of a bunch of rel Tables. same for rdf graph.

Solution

In memory data structures

add the abstraction of MetaEntry. An entry can be one of following types:
- NODE/REL TABLE SCHEMA
- TABLE/SCALAR/AGGREGATION FUNCTION
- TABLE GROUP (i.e. REL GROUP)
- RDF Graph
- TABLE STATS

each entry should maintain its own write version. (which can be extended to versioned chain if multi-version support added later)

class MetaEntry {
oid_t oid;
MetaType tableType;
string name;
std::vector<std::unique_ptr<MetaEntry>> children;
std::vector<MetaEntry*> dependencies;
string comment;
bool isDeleted;
std::unique_ptr<MetaEntry> writeVersion;
}

dependencies are explicitly stored as a vector of MetaEntry pointers.

On disk storage Add the abstraction of MetaWriter and MetaReader. Internally, they make use of Serialize and DeSerializer to read and write meta entries. Each entry starts with an offset in file, which is maintained inside a DiskArray, DiskArray<PageCursor> metaDA.

kuzudb / kuzu

Rework Catalog and TableStatisticCollection #2495

The problem

Solution