sfu-natlang / lensingwikipedia

Lensing Wikipedia is an interface to visually browse through human history as represented in Wikipedia. This the source code that runs the website:
http://lensingwikipedia.cs.sfu.ca
Other
11 stars 4 forks source link

Docker stuff #204

Open theq629 opened 8 years ago

theq629 commented 8 years ago

This is just some concerns about the new docker setup that I had after using it for the first time. I haven't been following the work on that very closely, so it's possible these are not really issues (but maybe should be documented at some point).

avacariu commented 8 years ago
  1. I've got a fix for this once web/user-accounts gets merged. It relies on docker-compose's ability to extend a common yml file with your own and running docker-compose -f myfile.yml up. I've got separate production, staging, and development yml files and those can kind of be extended by any user (although there are going to be some annoyances regarding the volumes_from directive which can't be extended, and which leads to not being able to extend the entire service definition which contains volumes_from, even if you override it).
  2. You can just use the run.py script instead of Docker for local development for now. It's possible to mount the code within the Docker container, and that's going to be easier once I'm done with the stuff I mentioned above. I'll make the development.yml mount the code, and {production,staging}.yml include the code in the image.
  3. The index data used by the query backend is made available through the data-only container which just mounts it from the host's filesystem, and it remains available from the host side. The containers can be restarted / rebuilt independently.

Honestly, there are a TON of things missing from Docker Compose*, and the next release (1.5.0) is going to be in October, so I'm just trying to find the simplest way to manage all this using the available features and without adding extra tools/scripts. I've already added a configure.sh script, but I'd really prefer avoiding it as much as possible since that's just an extra thing to maintain and extra step to remember to run.

*According to the documentation: "Docker Compose is still in its infancy and under active development."

theq629 commented 8 years ago

Ok, sounds good. I mostly just wanted to get my notes down somewhere before I forgot. For (2) and (3) let's just document likely use cases like that at whatever point those parts are stable enough to document.

avacariu commented 8 years ago

Yeah, I'll definitely make sure to have everything documented.

I'm looking at how to simplify all of setup even further. It'd be awesome if website.md would just say

  1. Edit this config file
  2. Run these 2 commands
  3. Open your browser and go to URL

That one config file could define the parameters given to both the web and query backend (through environment variables), as well as details about the Docker images/containers themselves (for when they're being build/created/run). The advantage to sticking to environment variables would that you wouldn't need to build separate images for wikipedia/avherald just because each domain needs to have different config files inside it. We can include all the files for both domains, and get the main code to choose which files it uses at runtime.

The web code is basically there (with some changes I made for point no. 1). The query backend might be there, I'd have to look more at it. It should be possible to pull out the clustering and TSNE code out of the image entirely and put into data-preparation/ since it's not used at runtime AFAIK. And then we could include both the wikipedia/avherald config files, and make both domains' indexes available, and the backend will choose which one it uses based on some environment variable.

anoopsarkar commented 8 years ago

I agree that it might be a good idea to put some of the whoosh index creation code into data-preparation. There is a dependency with backend but as long as that is documented somewhere, it should be fine.