ucberkeley / bce

Berkeley Common Environment provides a common Linux computational environment for classwork and research.
Apache License 2.0
13 stars 6 forks source link

build on cfncluster 16.04 LTS release once it is available for AWS #78

Open aculich opened 8 years ago

aculich commented 8 years ago

I've filed a feature request on the cfncluster issue tracker for them to release a 16.04 LTS based AMI that we can generate a custom build using their cookbook scripts with packer for our AWS image (rather than their other AMI-snapshot method of customization).

paciorek commented 8 years ago

Note: BCE-2016-spring.json now supports building BCE on top of a CFN Ubuntu image as the base image. Currently working with the CFN 14.04 LTS image. So we don't need to use their cookbook scripts; we can just work off their AMIs.

aculich commented 8 years ago

@dougalb says, "[16.04 LTS cfncluster AMI] has been released as part of cfncluster-1.3".

paciorek commented 7 years ago

I've just built and tested this as BCE-2016-fall-cfncluster-preview, which should (shortly) be public on AWS in the Oregon region. This was done using our Packer build system, building on top of the CFN Ubuntu 16.04 base image (as done previously for 14.04).

I verified that simple multi-node submissions via either SGE or SLURM work and invoke auto-scaling if needed.

Only major issue was that I had to prevent installation of ubuntu's lightdm package as that was pulling in the upstart package and the presence of /sbin/start was causing a problem when a virtual cluster was instantiated via 'cfncluster create', related to a cfncluster chef recipe trying to start the gmetad service and somehow attempting to do that via upstart. For future reference, error message from /var/log/cfn-init.log (on the instatiated master node) is given below.

On the branch cfn_fall_2016 (this can probably be safely merged into master), I updated a bit of the R/Python packaging (e.g., R packages and the Miniconda version) but did not update the Python packages (there's some manual labor there). Worth a discussion in terms of whether we are going to release a new general BCE version in the near future and coordinating the CFNcluster version with that.

service[gmetad] action enable (up to date)
  * service[gmetad] action restart

    ================================================================================
    Error executing action `restart` on resource 'service[gmetad]'
    ================================================================================

    Mixlib::ShellOut::ShellCommandFailed
    ------------------------------------
    Expected process to exit with [0], but received '1'
    ---- Begin output of /sbin/start gmetad ----
    STDOUT:
    STDERR: start: Unable to connect to Upstart: Failed to connect to
socket /com/ubuntu/upstart: Connection refused
    ---- End output of /sbin/start gmetad ----
    Ran /sbin/start gmetad returned 1