Implement fault-tolerant loading and restarting

Goal

One feature that has been removed as of the 1.0.0 version is the ability to stop and/or continue loading the dataset from a particular point. This is particularly useful when loading large datasets that run for many hours/days.

This feature is quite tricky to implement, and it will require new features on TypeDB.

Design

The thread reading from the CSV should apply a deterministic counter/ID to each batch that it reads. This should be supplied to the consumer threads which are going to write each row batch in a transaction.

At the start of a transaction, we need to write a globally unique Transaction ID (not available at this time) and the batch ID to a durable log - essentially a write ahead log. We then proceed to write and commit the data to TypeDB. If the commit succeeds, the row batch ID is added to a durable set of committed batches and removed from the WAL. If it fails, we remove it from the WAL only. Otherwise, if the program crashes or the state is uncertain, on restarting the data load the server has to be queried to check if the transaction ID was successfully committed or not (this API is not yet available).

There are bits we can optimise such as compacting the set committed batch IDs, etc.

typedb-osi / typedb-loader

Implement fault-tolerant loading and restarting #35

Goal

Design