tunapanda / provision

System for provisioning a new virtual machine with Tunapanda Edubuntu
7 stars 12 forks source link

Figure out a way to pre-populate site data #35

Closed usernamenumber closed 9 years ago

usernamenumber commented 9 years ago

Wikipedia for Schools, KA-Lite, and other resources require large amounts of content that are not provided by our ansible roles (download would simply take too long). Need a way to actually get this content onto the box.

Options include (in order of preference):

  1. Ansible roles for services that require extra data provide scripts that sync the data in the background. These scripts go in a shared directory where a cron job runs them regularly when network access is detected.
  2. Provide content archives for each service so that content can be manually downloaded and deployed using the same steps for each.
  3. Content for each service can be manually installed using whatever mechanism the upstream uses (e.g. WfS has a torrent, KA-Lite has downloads via their web interface or btsync (though the ARM version of btsync doesn't seem to work on the cubie), etc).
Ar2000jp commented 9 years ago

IMO, using btsync, git, or full tar.gz files is a bad idea. Btsync probably uses a lot of connections, which can, from my experience, make 3G hang. Git's diff download is good, but repo sync interruptions don't resume, and it doesn't do diff copy for binary files AFAIK. I suggest using an rsync server. This way we get diff copy, resumable downloads, and a weak/limited connection friendly sync operation.

usernamenumber commented 9 years ago

I had actually been meaning to suggest something just like that, actually. We'd need a place to host it, with an account set up for rsync-only access (or at least no shell access), identified with an ssh key we could lead on each system. I could set this up on a VPS here in the US, but something more local would probably be better. Can qrf provide? Or maybe aws would be good for this?

OK, gotta run or in going to miss my boat! :)

devalih commented 9 years ago

We can easily dedicate an ec2 instance. please let me know what is the requirement and I can set it up in no time.

Ar2000jp commented 9 years ago

I'm working on roles/scripts for this issue. Hopefully, I'll have them ready today.

usernamenumber commented 9 years ago

@devalih I think storage space is the main issue for us. Wikipedia for Schools and KA-Lite are about 6 gigs each. Since we will also need space for edX courses, I'd say we need something with at least 20 gigs of storage, though a bit more would be better, just in case. Would that be ok cost-wise? If so, set it up and get me login credentials (I can send you an SSH key when you are ready for it), and I'll sync all the data up to it, since I already have it here.

Please let me know when it's ready, or if you have any questions or problems.

usernamenumber commented 9 years ago

This is addressed by Ahmad's datasync_rsync role, which is used in the RACHEL integrations branch, which is only waiting on #48 for merge.

usernamenumber commented 9 years ago

The role was merged even though we're still waiting on automating rsphider, since having the content is still worthwhile. Closing this issue.