tunapanda / provision

System for provisioning a new virtual machine with Tunapanda Edubuntu
7 stars 12 forks source link

Ar/cron datasync #42

Closed Ar2000jp closed 9 years ago

Ar2000jp commented 9 years ago

Implementation for issue #35 using rsync and cron.

Ar2000jp commented 9 years ago

I just remembered that there's a cron.d folder. I totally forgot about that. I'll modify the cron_datasync role to use that soon. Sorry for the inconvenience.

usernamenumber commented 9 years ago

I haven't tried running this yet, but I have reviewed the code and this looks like very nice work! There are a couple of changes I think you will need, though:

  1. Since we have no guarantee that the server will have a consistent connection to the Internet, syncscript.sh should check for a connection, and exit silently if it doesn't find one. You can check for a connection by running scripts/has_internet, for example.
  2. I realize that the datasync_rsync__data_dir value in defaults/main.yml is just an example, but wouldn't this cause the cron job to try and rsync from a non-existent machine every time it runs (unless the default it overridden)? If so, it might be better to have these examples as comments rather than actual vars

Are you working on setting up an rsync server in Jordan to host the wikipedia, kalite, etc content? If let me know and I'll put that on my TODO list.

Thanks for taking the initiative to write this!

Ar2000jp commented 9 years ago

Good point about the default variables. I just wanted to give an example, since it's a bit complicated. I'll fix it right away.

Rsync can handle connection problems fine by itself. I tested it with the rsync server being down, and the connection being down. As for the chmod and chown statements, I think they should be executed everytime, since rsync might have been cut off halfway, and of course we don't want the wrong permissions for our data dirs.

I forgot to mention this in the original comment. I tuned rsync's parameters to make the sync as atomic as possible. And I used the rsync protocol to lower the overhead, and avoid the complexities (and fragility) of distributing a host key, and a read only private key. Also, I used chown and chmod instead of the internal rsync arguments because those can be tricky to handle, and they're only available in more recent versions. E.g. they're not available on my test machine, which is Ubuntu 12.04.

usernamenumber commented 9 years ago

Ok, that all sounds good. What about setting up the rsync server? Does QRF have one we can use (if so, get me access to it and I can start putting the wikipedia, kalite, etc content there, unless you've already got that data.

Also, I suggest adding allow_duplicates: yes to datasync_rsync/meta/main.yml. This way when other roles need to install an rsync scripts, they can call datasync_rsync as a dependency (without this setting only the first call will actually do anything).

Ar2000jp commented 9 years ago

Although my testing shows that it's working without allow_duplicates, I went ahead and added it anyway. Maybe allow_duplicates is for when the input parameters are the same?

About the server, I'll talk to Ali about it.