Docker stuff - Githubissues

theq629 commented 8 years ago

This is just some concerns about the new docker setup that I had after using it for the first time. I haven't been following the work on that very closely, so it's possible these are not really issues (but maybe should be documented at some point).

Right now it's necessary to make changes (for ports, etc.) to docker-compose.yml, which makes it easy to accidentally check in personal settings and also makes git rebase harder. It would be nice if we could have a local config (like local_config.py), or provide a default file which the user has to copy to the correct filename.
It seems like it's necessary to redo the entire build and up steps to make changes live, which makes frontend development slow since there is a long delay for the build and for the backend to start up. I'm not sure this is really a problem since we can presumably keep running the frontend manually without docker for basic development.
Due to the same delays as for the previous point, it seems like there would be a delay in updating a running site while the site goes down. Do we have any way around this? The backend supports changing data while running and without interrupting serving requests, so requiring it to totally restart for frontend changes would add downtime.

avacariu commented 8 years ago

I've got a fix for this once web/user-accounts gets merged. It relies on docker-compose's ability to extend a common yml file with your own and running docker-compose -f myfile.yml up. I've got separate production, staging, and development yml files and those can kind of be extended by any user (although there are going to be some annoyances regarding the volumes_from directive which can't be extended, and which leads to not being able to extend the entire service definition which contains volumes_from, even if you override it).
You can just use the run.py script instead of Docker for local development for now. It's possible to mount the code within the Docker container, and that's going to be easier once I'm done with the stuff I mentioned above. I'll make the development.yml mount the code, and {production,staging}.yml include the code in the image.
The index data used by the query backend is made available through the data-only container which just mounts it from the host's filesystem, and it remains available from the host side. The containers can be restarted / rebuilt independently.

Honestly, there are a TON of things missing from Docker Compose*, and the next release (1.5.0) is going to be in October, so I'm just trying to find the simplest way to manage all this using the available features and without adding extra tools/scripts. I've already added a configure.sh script, but I'd really prefer avoiding it as much as possible since that's just an extra thing to maintain and extra step to remember to run.

*According to the documentation: "Docker Compose is still in its infancy and under active development."

theq629 commented 8 years ago

Ok, sounds good. I mostly just wanted to get my notes down somewhere before I forgot. For (2) and (3) let's just document likely use cases like that at whatever point those parts are stable enough to document.

avacariu commented 8 years ago

Yeah, I'll definitely make sure to have everything documented.

I'm looking at how to simplify all of setup even further. It'd be awesome if website.md would just say

Edit this config file
Run these 2 commands
Open your browser and go to URL

That one config file could define the parameters given to both the web and query backend (through environment variables), as well as details about the Docker images/containers themselves (for when they're being build/created/run). The advantage to sticking to environment variables would that you wouldn't need to build separate images for wikipedia/avherald just because each domain needs to have different config files inside it. We can include all the files for both domains, and get the main code to choose which files it uses at runtime.

The web code is basically there (with some changes I made for point no. 1). The query backend might be there, I'd have to look more at it. It should be possible to pull out the clustering and TSNE code out of the image entirely and put into data-preparation/ since it's not used at runtime AFAIK. And then we could include both the wikipedia/avherald config files, and make both domains' indexes available, and the backend will choose which one it uses based on some environment variable.

anoopsarkar commented 8 years ago

I agree that it might be a good idea to put some of the whoosh index creation code into data-preparation. There is a dependency with backend but as long as that is documented somewhere, it should be fine.

sfu-natlang / lensingwikipedia

Docker stuff #204