syndicate-storage / syndicate

Internet-scale software-defined storage system
Apache License 2.0
56 stars 10 forks source link

UG should log writes to stable storage so it can resume on failure #77

Closed jcnelson closed 10 years ago

jcnelson commented 10 years ago

When a UG writes data, it should log which blocks the write generated and which blocks the write garbage-collected before carrying them out. Then, if the UG stops and restarts, it should examine this log and roll back any partially-written writes.

jcnelson commented 10 years ago

This will likely involve involvement with the MS, and the creation of an "fsck" tool. If the UG dies permanently, partially-written data will need to be garbage-collected. We'll need to have the UG keep the MS posted on how much garbage collection it has done, so a subsequent "fsck" can find which manifest(s) correspond to incomplete writes.

jcnelson commented 10 years ago

The strategy would be as follows:

This would be done asynchronously--the UG would iteratively garbage-collect all older versions of the objects for which it is the coordinator. This is effectively an on-line automatic "fsck" implementation mentioned above, which requires the UG not rely on any local state.

The driver itself should have a say in whether or not a particular version gets garbage-collected--depending on the application, the UG may want to preserve older versions of the file, and let the driver roll them back. Don't worry about this for now, but keep this in mind when implementing the garbage collection.

jcnelson commented 10 years ago

MS tracks manifest timestamps, as of 4a3440b4a8c105b6e16413efa58a2874b8ae8908. The UG will present RG-generated deletion receipts for each manifest, providing an extra degree of authentication for the UG to the MS.

jcnelson commented 10 years ago

We'll also have to do this for each block.

jcnelson commented 10 years ago

So....the problem that deletion receipts are meant to solve is to prevent a rogue UG from credibly claiming to have garbage-collected data when it hasn't. If RGs confirm to the MS that they have carried out the required deletion operations, the RG operator (who pays for storage) won't be cheated by a UG.

This is really only a problem when the UG isn't paying for storage. If this isn't the case--if the UG operator is paying for storage--then (s)he has an incentive to garbage-collect his/her old data already, since the only person who will be cheated by rogue behavior is him/herself.

In OpenCloud, the UG operator is not paying for storage--we are. However, we can also track the UG operators' behaviors, since we're the administrators. We know which slices are running which UGs, and who ones the slices. So, if someone wants to go rogue, we can kick them out, bill them, and erase their data ourselves (since we'll know which files are theirs).

This suggests that deletion receipts are unnecessary, at least for now. We can get away with having the UG simply delete log records from the MS.

jcnelson commented 10 years ago

Fixed in ce0b805b8d8ac95b96e3e3af9140762b158952df