Avoiding problems with the S3 sync

Context: We use the @monolambda/s3 js library to do most the operations.

Download data function The initial download data function that we execute when we are deploying a new server was failing many times. I replaced it with the aws s3 sync command, provided by the python awscli library. We can invoke it with yarn get_from_s3. This change is on https://github.com/hotosm/OpenMapKitServer/pull/73.

Sync operations We were syncing all the data dir to S3 after each operation that modifies the data dir, for example, when a new form is uploaded, when a submission is received, when a form is archived, restored or deleted.

We had some problems with that:

When the server deploy is being updated and the download of the data was not finished and someone makes a form submission, it caused the data that is not present on the server to be set as deleted in the S3 bucket.
When more than one submission is made at the same time, it caused one of the submissions to be set as deleted by the other submission.
Sometimes there is a delay between the sync operation is executed on the server and the file being updated on S3. It caused some files to have multiples versions on S3, but without any difference between the versions.

I fixed those issues on https://github.com/hotosm/OpenMapKitServer/pull/73 by syncing only the directory where a change was made. On the case of a new submission, it will sync only the data/submissions/<form_name>/<submission_id> directory. On the case of a new form it will sync only the data/forms dir. When a form is archived or restored, we sync the data/forms and the data/archive/forms. Furthermore, we will not move the submissions dir to the archive dir when a form is archived.

A possible failure point is that if the new submission sync operation fails, it will not be retried. I'll try to make a callback function to verify if the sync was successful before returning the submission operation response.

Make undelete easier and faster

We are using versioning on S3, so we didn't lose data, but the process of undeleting is very manual and takes a lot of time. I need to enter in each directory on AWS Console and delete the Delete marker versions of the files. Having a periodic backup of the S3 data would help us to restore deleted files faster.

hotosm / OpenMapKitServer

Avoiding problems with the S3 sync #75