mediacloud / backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
http://www.mediacloud.org
GNU Affero General Public License v3.0
278 stars 87 forks source link

Implement installation and deployment tasks in Ansible #179

Closed pypt closed 6 years ago

pypt commented 7 years ago

@rahulbot & Cindy need a testing server, so we'll have to maintain two servers from now on.

This is a good chance to rewrite our installation and deployment procedures to Ansible.

pypt commented 7 years ago

Rewrote Media Cloud installation scripts to Ansible in ansible_provisioning branch. The remaining task is to implement automatic deployments using Ansible.

Implemented Ansible roles that:

As per the book, all the rules are idempotent, meaning that Ansible tries to get the host into a specific "state" rather than blindly running a bunch of commands.

Everything runs and gets tested on Travis; test builds are now a little bit slower (because more stuff happens on the test build now), but maybe there's a way to run those builds using Docker on Travis and so speed up the test runs by using an Ubuntu image with pre-installed dependencies, don't know, haven't looked into it yet.

I took the opportunity to simplify the Media Cloud setup a little too:

hroberts commented 7 years ago

This is terrific. Very exciting for us to move forward with a more sane deployment system. The better environment system looks great as well.

How do we track CPAN dependencies now?

As I think you sense, I'm nervous about losing the version locking. The current system was not locking perfectly, but it was greatly reducing the amount of constant version shifting to the relatively rare times when we added a new module to cpanfile. Without that protection, we will likely be getting frequent (daily?) version changes coming down the stream from CPAN. The carton stuff was not added just for theoretical safety -- it was added because we were having frequent problems with module upgrades breaking things in production and with new install not working.

As you point out, we now have the protection of the daily vagrant test to help us find dependency tests as they arise, so hopefully that will be enough to keep things sane. We also have the protection of CPAN being much less active these days, so we'll just get fewer upgrades coming through the pipe.

There's also just an inherent danger in the developers running different versions of modules on their system than what is running in production (and in various production systems running different versions of modules). We could mitigate that problem by adding a module update process to the deployment that insures that we have the latest version of all modules every time we deploy (and also run this before we test every time). That way we're at least consistent across all of our dev, testing, and production systems.

-hal

On Thu, Aug 17, 2017 at 7:16 PM, Linas Valiukas notifications@github.com wrote:

Rewrote Media Cloud installation scripts to Ansible in ansible_provisioning https://github.com/berkmancenter/mediacloud/tree/ansible_provisioning branch. The remaining task is to implement automatic deployments using Ansible.

Implemented Ansible roles that:

As per the book, all the rules are idempotent, meaning that Ansible tries to get the host into a specific "state" rather than blindly running a bunch of commands.

Everything runs and gets tested on Travis; test builds are now a little bit slower (because more stuff happens on the test build now), but maybe there's a way to run those builds using Docker on Travis and so speed up the test runs by using an Ubuntu image with pre-installed dependencies, don't know, haven't looked into it yet.

I took the opportunity to simplify the Media Cloud setup a little too:

-

Python's Virtualenv now gets stored in ~/.virtualenvs/mediacloud/ instead of mc-venv/ so that it can be both 1) used with virtualenvwrapper https://virtualenvwrapper.readthedocs.io/en/latest/ and 2) be cached on Travis

After some consideration, I've removed Carton and made Ansible install all the Perl dependencies using cpanm to perlbrew-system@mediacloud library under ~/.perlbrew/ because:

  • Carton is buggy. I've spent endless hours writing exceptions over exceptions https://github.com/berkmancenter/mediacloud/blob/master/install/install_modules_with_carton.sh to make it work. Also, quite often one needs to run carton install three times for all the dependencies to get installed.

    • Carton is slow. To make the Perl script use dependencies installed with Carton, we have to use run_with_carton.sh wrapper which runs a Perl script that runs another Perl script. Why does it take 500 ms for it to set PERL5LIB is beyond me.
    • Carton is hard to maintain. We've both spend yet more hours committing cpanfile.snapshot which always gets overwritten with some new stuff.
    • Carton is confusing. Currently we install some Perl dependencies using APT using one version of Perl, some other dependencies "outside of Carton" https://github.com/berkmancenter/mediacloud/blob/master/install/install_modules_outside_of_carton.sh using another (Perlbrew) version of Perl, and then the majority of dependencies "with Carton" https://github.com/berkmancenter/mediacloud/blob/master/install/install_modules_with_carton.sh into yet another Perl library.
    • One can't use an IDE (e.g. Komodo https://www.activestate.com/komodo-ide) with Carton.
    • We don't really use Carton's main feature which is dependency version locking. I don't know about you, but whenever I add / remove a Perl dependency, I just commit cpanfile.snapshot and pray for it to work (which it usually doesn't) without giving much thought about which other dependencies got updated. I understand that Carton was introduced to have Perl dependency version locking, but I think that it has since become an outdated nuisance rather than a helpful feature.

    If we absolutely positively want to have this dependency version locking (which, I think, we don't really use that much and which is likely obsolete now that we run our unit tests quite often on Travis), I'd much rather have a DarkPAN mirror on S3 http://blogs.perl.org/users/steven_haryanto/2014/01/installing-modules-from-cpan-and-your-own-darkpan.html and install hardcoded versions of everything from there rather than get back to Carton.

  • I've merged multiple "initialize some environment variables to use correct Perl / Python and then run something" wrapper scripts into a single run_in_env.sh. The script initializes Perlbrew environment, Virtualenv environment, and then runs whatever command was passed as its argument. This will allow us to have a single wrapper for running both Perl and Python scripts.

    For example, with run_with_carton.sh one has to run a single unit test as follows:

    Call carton exec, which will run prove, which

    requires an additional Perl include path

    ./script/run_carton.sh exec prove -Ilib/ unit_test.t

    With the run_in_env.sh script, one can do:

    Initialize Perlbrew, add "lib/" to Perl

    include path, initialize Virtualenv,

    run the argument command

    ./script/run_in_env.sh prove unit_test.t

    ...or you can simply start a shell and run perl / python normally:

    $ ./script/run_in_env.sh bash mediacloud$ prove unit_test.t

    I hope this will save some development time for us.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/berkmancenter/mediacloud/issues/179#issuecomment-323227159, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvvTx-xHUi2_tQ8b-WzXdMv_ZRxAopvks5sZNfrgaJpZM4OdgN4 .

-- Hal Roberts Fellow Berkman Klein Center for Internet & Society Harvard University

rahulbot commented 7 years ago

This is a pain on the front end as well, as it pertains to Javascript npm dependencies. Things were just changing too fast for us to not specific module versions. So we have our version lock file checked in and upgrade manually every month or two (we track this on a bug). Some modules are purposefully held back to older versions because newer ones would require extensive changes that don't give us much of a win. That's the world of JS right now...

pypt commented 7 years ago

Sorry for not being clear. I'm like dependency version locking, I just don't like the tool that we use for it (Carton) due to things listed in the above comment, and so I would like to replace it with some other option.

As far as I see it, the options are:

  1. Don't do any dependency version locking. Rey on the fact that there isn't much action on CPAN these days, and of something changes significantly, the testing is likely to catch it. Pros: easy to do, already works. Cons: technically unsafe.
  2. Hardcode versions of direct dependencies, don't worry about the sub-dependencies that cpanm installs. We only care about interfaces of modules that we use directly and not so much about the modules (and their versions) that are used by those dependencies. Pros: still easy to do, slightly safer than doing nothing. Cons: not 100% safe, but then the testing should be able to catch those rare occurrences when a sub-sub-sub-dependency breaks?
  3. Maintain DarkCPAN mirror with a copy of modules and specific versions of them that we use. I've made one (https://github.com/berkmancenter/mediacloud/commit/096f41bc59f9063b4308ccf7a24f5bb7bc65c897) and it seems to work fine. Pros: super paranoid. Cons: while downloading all the CPAN modules and publishing them to S3 could be done with a simple script, maintaining such a mirror still involves some semi-manual work; adding a new CPAN dependency is no longer that trivial.
  4. Try to live with Carton and continue with semi-locked dependencies that we are able to install with the tool. Pros: somewhat tried and tested way. Cons: very hard to maintain, a big share of dependencies have to be installed separately with cpanm and without any version locking, massive cpanfile.snapshot file with constant commits to it.

I'd go with 2 or 3.

pypt commented 7 years ago

Some more developments:

Main remaining task is to automatically deploy code and restart services using Ansible.

pypt commented 7 years ago

(Clicked "Comment" too early on previous comment, updated it since.)

pypt commented 6 years ago

Ansible branch merged and deployed. The summary of this Brave New World:

Let me know if something doesn't work or if you have questions.