Currently copying a node entails copying the S3 file and then copying the Mongo node. This deep copy occurs because if two nodes point to the same S3 file and one of them is deleted, the delete operation will delete the S3 file and leave the remaining node with a dangling reference.
To allow for shallow copies, which should take milliseconds as opposed to potentially hours:
On a delete request:
Place a delete event for the S3 file in a new mongo collection uniquely indexed by the S3 ID with a timestamp.
Any new deletion events for that node should update the timestamp atomically, e.g. $upsert.
Delete the node, but not the S3 file.
Have a thread that checks for delete events that are older than some reasonable time such that we can be sure any copy requests have completed.
If any delete events are active, check if there are any nodes with pointers to the S3 file.
After the check, ensure the event timestamp hasn't changed. If it has, abort.
This prevents the case where a node is deleted with a copy event in progress after the check starts. If that happens, the file could be deleted even though the copy will produce a pointer to the file.
If pointers to the file exist, delete the delete event if the timestamp hasn't changed, otherwise do nothing.
Needs an index on the file pointer
If not, delete the S3 file and then delete the delete event.
On a node copy, create a new node pointing to the same S3 file rather than copying the file.
Currently copying a node entails copying the S3 file and then copying the Mongo node. This deep copy occurs because if two nodes point to the same S3 file and one of them is deleted, the delete operation will delete the S3 file and leave the remaining node with a dangling reference.
To allow for shallow copies, which should take milliseconds as opposed to potentially hours:
On a delete request:
$upsert
.On a node copy, create a new node pointing to the same S3 file rather than copying the file.