metanorma / metanorma-cli

CLI (Command Line Interface) for Metanorma
BSD 2-Clause "Simplified" License
5 stars 5 forks source link

Giving access to Metanorma through a web function #1

Open ronaldtse opened 6 years ago

ronaldtse commented 6 years ago

Recently we were asked to deploy Metanorma to a web location so that people can directly upload an adoc file (or a zipped directory) to generate Metanorma output, rather than have to install Ruby and the toolchain via metanorma-macos-setup.

We wish to use AWS Lambda via Terraform to achieve this. For example, we can provide a https://app.metanorma.com endpoint that links to this Lambda Metanorma function. This Lambda function will have to run Ruby and run the Metanorma-CLI.

Since AWS Lambda does not natively support Ruby, we will need to use a precompiled Ruby that runs on Lambda. AWS's page on Ruby demonstrates how it works using Traveling Ruby.

However, Phusion's Traveling Ruby is locked to Ruby 2.2 and seems like abandonware. But Homebrew has a Portable Ruby (https://homebrew.bintray.com/bottles-portable/) that gives Ruby 2.3. I suspect it is possible to run the full Metanorma toolchain with 2.3 (with some work, of course).

The Lambda code size (zipped) limit is 50MB, but we can always use S3 to store the package if it's too large (the gems and all), and make the Lambda function pull the package on demand (each Lambda invocation gets 500MB in /tmp).

@strogonoff would you have time to take on this challenge?

ronaldtse commented 6 years ago

There is a gem called Rumbda that wraps Ruby code with Traveling Ruby into a Lambda zip file, but since we're using the Homebrew Portable Ruby 2.3, you'll have to fork Rumbda to change the Ruby zip location, and possibly applying the patches from this fork: https://github.com/jdanielian/rumbda .

erikbor commented 6 years ago

Maybe Exodus can help out.

Exodus is a tool that makes it easy to successfully relocate Linux ELF binaries from one system to another. This is useful in situations where you don't have root access on a machine or where a package simply isn't available for a given Linux distribution. For example, CentOS 6.X and Amazon Linux don't have packages for Google Chrome or aria2. Server-oriented distributions tend to have more limited and outdated packages than desktop distributions, so it's fairly common that one might have a piece of software installed on their laptop that they can't easily install on a remote machine.

ronaldtse commented 6 years ago

Thanks for the great tip @erikbor -- the only issue I ran into was the dependency on system libxml/libxslt.

The latest Portable Ruby 2.3.7 is available here: https://bintray.com/homebrew/bottles-portable-ruby/portable-ruby/2.3.7

Really easy to run (on macOS):

curl https://bintray.com/homebrew/bottles-portable-ruby/download_file?file_path=portable-ruby-2.3.7.leopard_64.bottle.tar.gz -sSL -o portable-ruby.tar.gz
tar -zxvf portable-ruby.tar.gz
portable-ruby/2.3.7/bin/ruby -v
=> ruby 2.3.7p456 (2018-03-28 revision 63024) [universal.x86_64-darwin9.0]

On Linux:

curl -sSL -o portable.tar.gz https://bintray.com/homebrew/bottles-portable-ruby/download_file?file_path=portable-ruby-2.3.7.x86_64_linux.bottle.tar.gz
tar -zxvf portable.tar.gz
export PATH=portable-ruby/2.3.7/bin:${PATH}
ruby -v
=> ruby 2.3.7p456 (2018-03-28 revision 63024) [x86_64-linux]
ronaldtse commented 6 years ago

The only issue is the installation of Nokogiri: it requires installing the following packages:

yum install -y libxslt-devel libxml2-devel make gcc 

And this command (or the equivalent bundle config):

gem install nokogiri -- --use-system-libraries
ronaldtse commented 6 years ago

And here's how you test Lambda functions locally: https://github.com/lambci/docker-lambda

strogonoff commented 6 years ago

I think making Metanorma easier to adopt is a valid goal in itself.

Allowing to run a Metanorma build right from the landing page is perfect in context of that, added to https://github.com/orgs/riboseinc/projects/3.

Not so tangentially, another feature with potential to speed up the adoption of Metanorma is offering the workflow in containerized form. Pros and cons of Docker aside, these days it seems to be the most popular way of projects making themselves available with the least amount of friction, and setting up Metanorma to run in production is somewhat complex in its current state.

Bringing up containerization not just because it removes a barrier to project’s adoption, but also because we can run that image to provide API for metanorma.com web-based build in the first place. Running a Docker image on ECS or EC2 may be slightly less elegant as running a Lambda function, but IMO just as straightforward and possibly more pragmatic, saving the Ruby porting effort and serving as a test for the Docker image offered by Metanorma.

The image as I envision it could provide a basic HTTP API and web UI, similar to what Browserless does with headless Chrome (but more minimal). Putting such an image together based on currently available installation instructions for Metanorma looks pretty straightforward.

ronaldtse commented 6 years ago

The fact that @opodjis has just made the entire Metanorma toolchain compatible with Ruby 2.3 should make running with Lambda possible. I suspect that we could use Exodus to package the whole thing, and run it inside a Docker container.

Currently setting up a Docker container with Metanorma is quite simple.

On CentOS 7:

curl -L https://github.com/riboseinc/yum/raw/master/ribose.repo > /etc/yum.repos.d/ribose.repo
yum install -y git make libxml2-devel libxslt-devel java-1.8.0-openjdk
yum install -y rbenv ruby-build rbenv-ruby-2.4.3
. /etc/profile.d/rbenv.sh
rbenv shell 2.4.3
gem install bundler
gem install metanorma
gem install metanorma-cli

On Ubuntu 18.03:

sudo apt install git make ruby-bundler ruby-dev libxml2-dev libxslt-dev default-jre
sudo gem install bundler
sudo gem install nokogiri -v '1.8.4'
sudo gem install metanorma
sudo gem install metanorma-cli

Note: we still need to install PlantUML to support dynamic generation of UML diagrams.

strogonoff commented 6 years ago

@ronaldtse

I suspect that we could use Exodus to package the whole thing, and run it inside a Docker container.

Should be possible to Exodus a ruby binary of requisite version and hopefully it’d be within the Lambda size limit. Not yet sure how to deal with project dependencies, since Exodus seems to be designed to work against a single binary. Naively using Exodus against metanorma executable after installing as prescribed wouldn’t work, since metanorma isn’t a binary but a Ruby script that references rbenv, which is also a Ruby script that does some dynamic magic and references a dylib. Maybe I’m missing something.

The fact that @opodjis has just made the entire Metanorma toolchain compatible with Ruby 2.3 should make running with Lambda possible

I believe it’s possible to run it on Lambda one way or another, just seems like a lot of headache—running it once, plus maintaining a repeatable procedure that can re-bundle it for Lambda when new versions are out. (Probably worth making this part of project release procedure, to ensure landing page web build offers the latest version of Metanorma.)

Currently setting up a Docker container with Metanorma is quite simple.

Yep, it’s not that complex to bring up a container with Metanorma. Throwing in basic HTTP API as a facade to metanorma-cli looks straightforward too.

‘How Did I “Hack” AWS Lambda to Run Docker Containers’ claims it’s possible to use devel branch of udocker to run Docker containers on AWS Lambda.

Since Ribose uses TF, the exact mechanism through which a Docker image gets launched would be abstracted away, be it Lambda or ECS or EC2. Deploying onto Lambda does smell a bit but would make sense if we expect the function to be called rarely enough that cost savings would add up[0]. However, in that case we’ll also have the cold start issue, where Lambda would have to load the whole image if function doesn’t get called for some time.

[0] (EDIT) I believe with ECS the costs for running a container could be made minimal if there’re other parts of architecture also using ECS, as under ECS a single EC2 instance can host multiple containers.

strogonoff commented 6 years ago

Making sure the rationale is clear and this is indeed what we want before a significant amount of time is spent on R&D to figure out how to run that on Lambda nicely and have at least somewhat repeatable bundling flow for it.

I think either Traveling Ruby (with Node wrapper) or Lambda + Docker (through udocker devel branch) should work. Couldn’t see how Exodus would work so far.

Native Ruby support would be best, AWS may be working on it now that Go support is out. Ruby was the runner-up in their poll for native language support in Lambda. If there’re immediate savings to be had by running this on Lambda, though, it may not be worth waiting for it.

ronaldtse commented 6 years ago

I already tried Exodos but it is only for ELF binaries. udocker seems much slower than native Lambda. The Traveling / Portable Ruby approach should work.

Technically, the lambda archive can be built in these steps:

  1. Run the amazonlinux docker container
  2. Install all Metanorma dependencies
  3. Extract the differences between the new container and the original amazonlinux container via docker diff and docker export.

The problem is that the resulting file size is way too high (we could move the archive to S3 for file size reasons), and if we run this script on demand it becomes to slow.

The two options we have are:

  1. Lambda + S3 (probably fastest)
  2. AWS Fargate, which allows us to provide the container to run (have lead time issues: https://www.reddit.com/r/aws/comments/7o5fc0/reducing_fargate_ecs_lead_time/)

In any case, we need:

  1. a web interface to upload stuff to the Metanorma application endpoint
  2. the Metanorma application endpoint
strogonoff commented 6 years ago

This comment reiterates the options available, calls for comments, and outlines one possible roadmap in some detail.

The two options we have are

I believe we also have the ECS on EC2 option, unless I’m missing some reason it was ruled out. Agreed with possible cold-start slowness when using Lambda or Fargate, but we can try and see.

To sum up, to implement these requirements (which I believe I got correctly)

a web interface to upload stuff to the Metanorma application endpoint the Metanorma application endpoint

I believe the full list of options looks like this:

  1. Traveling Ruby on Lambda (with bundled deps)
  2. Docker container on Lambda as @ronaldtse described in preceding comment
  3. Short-lived Docker container managed by ECS, run each time to generate output and save it into a temporary S3 object
  4. Long-running Docker container on ECS (whether EC2 or Fargate) running an API endpoint within
  5. TF and/or Ansible provisioning straight-up onto a dedicated nano EC2 instance

I’d request @ronaldtse / @opoudjis feedback (anyone else should be mentioned?) on some considerations in favor of one option or another. My considerations:

I’m not emotionally attached to any option, but 1–3 involve more uncharted territory from my vantage point.

Cold start may be an issue with options 1–2, possibly 3, possibly 4 if Fargate is used.

In 1–2, the HTTP API would supposedly be implemented in Lambda function; in 3–4 a minimal webapp can be bundled using e.g. Flask.

Docker roadmap

If we rule out (1) and (5) above, I believe the following can be tackled right now since we’re likely to use the created image either way:

(1) Build a Docker image with latest Metanorma CLI

The following seems to make sense if we choose to proceed with a long-running container on ECS:

(2) Within the image, add HTTP API facade accepting an upload, calling meanorma-cli and returning archived output. Possibly throw in a web UI for good measure (3) Implement container deployment to ECS On EC2 would be my pick but Fargate is on the table (4) If warranted, implement Lambda function proxying calls to the container (5) Update Metanorma.com to add directory/file input that calls the API with selection and passes the response to the user as a download. (May or may not involve a short-lived S3 object, not sure about the details yet.)

with further steps:

(6) Add CORS headers as an attempt to somewhat limit potential abuse of the API (although that’d be “a good problem to have”) (7) Implement procedures for maintaining the Docker image, make it an official way to get Metanorma workflow up and running for new users (8) Implement an option to select output format