graph-genome / graph_summarization

Browser for Graph Genomes built with VG based on Graph Summarization to provide semantic zoom. As a user zooms in on a graph genome, the topology becomes more complex. Provides visualization for variation within a species of plant or animal. Designed to scale up to thousands of specimens and provide useful visualizations.
Other
7 stars 1 forks source link

Building a VG and SequenceTubeMap Docker Container #2

Closed josiahseaman closed 5 years ago

josiahseaman commented 5 years ago

Currently, VG is automatically built in a docker container. SequenceTubeMap instructions say their Docker container is out of date and recommend building using npm or yarn. Goal: Create a new Docker container with VG and SequenceTubeMap that can be used for a Demo server with custom data and deployed on Google Cloud services.

/vg/sequenceTubeMap# node --version v4.2.6

Progress So Far

subwaystation commented 5 years ago

Could you please add the Dockerfile somewhere? So we could help you out :)

josiahseaman commented 5 years ago

I'd like to do that. I'm a complete noob when it comes to Docker. It looks like I've got basic SequenceTubeMap up without port forwarding. So I'll post the file and see what setup needs done. I'd like to put this up on the cloud so we can load our own data for display.

josiahseaman commented 5 years ago

Any guidance on how to grab the file in windows and share it with you? I have a working server, though it seems like the files are safely tucked away inside a virtual hard disk that I can't access. "C:\Users\Public\Documents\Hyper-V\Virtual hard disks\MobyLinuxVM.vhdx"

subwaystation commented 5 years ago

I do not have any experience with a LinuxVM in Windows. Can't you mount your Windows file system into the VM and then write the file somewhere there? Just a wild suggestion.

josiahseaman commented 5 years ago

Here's the file download created using docker save -o tubemap_server.tar tubemap_server https://drive.google.com/open?id=1I7yQw4hcvSRZA57vz9VB2tC8rxV2AkHl

You can ostensibly load it using sudo docker load --input tubemap_server.tar.

Once you load it, you'll likely need the commands listed above starting with docker run.

subwaystation commented 5 years ago

Thanks! So what I meant was the Dockerfile from which you created the docker image, so I could build it on my own. Maybe your way was faster anyhow.

I got your setup to run, as follows: docker run -p 127.0.0.1:3000:3000 -it 9ac2d839e36f /bin/bash then cd sequenceTubeMap either node src/server.js or yarn serve works for me. Just go to http://127.0.0.1:3000/ and the TubeMap is there ;)

subwaystation commented 5 years ago

So I would recommend when you build the docker image, that you directly cd into sequenceTubeMap. Then you should be able to run the image without having to navigate in it via: docker run -d -p 127.0.0.1:3000:3000 -9ac2d839e36f yarn serve and run it as a deamon.

subwaystation commented 5 years ago

What I would like to have on the long run is that we have Bioconda packages for both vg and sequenceTubeMap and then we can build a docker image very easily. That has been established as a sort of best practice in pipeline development in https://github.com/nf-core where people already have a lot of experience with docker.

josiahseaman commented 5 years ago

Would you like to make those changes and push the container to docker hub? Here's the instructions: https://stackoverflow.com/questions/28334706/how-to-package-a-docker-image-in-a-single-file

subwaystation commented 5 years ago

I can do that. Might just take some time, especially because vg is not so easy to compile. If you expect results in the next 2 weeks I would not count on that. Bioconda: Bioconda Github

Dockerfile:

FROM continuumio/miniconda:4.5.4
LABEL authors="your@email.com" \
      description="Container image containing all requirements for tubeMap Server"

COPY environment.yml /
RUN conda env update -n root -f /environment.yml && conda clean -a

Possible environment.yml:

name: tube-map-server-1.0.0
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - vg=?.?.?
  - tubeMap=1.0.0
subwaystation commented 5 years ago

For a better starting point, could you please send me the Dockerfile for you image? Thanks ;)

josiahseaman commented 5 years ago

You did get this file right? https://drive.google.com/open?id=1I7yQw4hcvSRZA57vz9VB2tC8rxV2AkHl

The base here is vg-docker that is already built by the VG team. I just installed SequenceTubeMap inside the same docker. Following these instructions.

My initial command was:

docker pull quay.io/vgteam/vg:v1.15.0-208-gce79450f1-t311-run
docker image ls
docker run quay.io/vgteam/vg:v1.15.0-208-gce79450f1-t311-run vg version
subwaystation commented 5 years ago

You did get this file right? https://drive.google.com/open?id=1I7yQw4hcvSRZA57vz9VB2tC8rxV2AkHl

Yes :)

Ah I see. So you directly manipiulated the image and not the Dockerfile itself. Now I have a clear picture.

josiahseaman commented 5 years ago

This started as a one day test to see whether I could server SequenceTubeMap and VG out of a Docker image at all. It looks like it's doable, which means it could go back into a DockerFile to be done properly. Though for my purposes I only need to move the file once to get it on a cloud server with 1001 Genomes data loaded.

josiahseaman commented 5 years ago

I agree. In the short term, I just want a demo up and running with some of our data.

subwaystation commented 5 years ago

Which should be possible with your current docker image, right? Or shall I push the current image to https://hub.docker.com/? As soon as the vg and TubeMap are in Bioconda I will create a clean Dockerfile and build the container again.

josiahseaman commented 5 years ago

@subwaystation , do you have any opinions on whether we should use one or more docker containers for this project and for vg browser in general? It's a new concept to me, so I have no opinion. https://docker-curriculum.com/#multi-container-environments

subwaystation commented 5 years ago

@josiahseaman multi-container environments are a new concept for me. I will discuss this with a colleague of mine, who has a lot of experience with docker. Then I will make a suggestion.

josiahseaman commented 5 years ago

Here's my complete notes on successful commands. Note at the beginning I relied on using bash to edit the instance. It would still be nice to make a DockerFile with docker compose for updating to the latest VG and SequenceTubeMap versions. However, at the moment the interactive edit and commit is useful for packaging our demo data files directly into the Docker container.

docker pull quay.io/vgteam/vg:v1.15.0-208-gce79450f1-t311-run
docker image ls
docker run quay.io/vgteam/vg:v1.15.0-208-gce79450f1-t311-run vg version

resumes my edited instance docker start -a sleepy_booth

Using 1001G Data

From sixref_Chr4_stable_unzipped_prealigned_refined_realigned.gfa create XG file.

Copy file into container
docker cp "E:\Google Drive\1001G (1)\sixref_Chr4_stable_unzipped_prealigned_refined_realigned.vg" sleepy_booth:/vg/

Launch sleepy_booth:

docker start sleepy_booth  
docker exec -it sleepy_booth bash

vg index -x sixref.xg -g sixref.gcsa sixref_Chr4_stable_unzipped_prealigned_refined_realigned.vg This command produces sixref.xg but no GCSA file. No errors. IVG Error: Can only handle XG file up to 5MB in size.

Start server in background (this should be made part of the container) nohup yarn /vg/sequenceTubeMap/serve &

Pruning without reference

/vg# vg prune sixref_Chr4_stable_unzipped_prealigned_refined_realigned.vg > sixref.pruned.vg Killed
Instead, I was able to place the files in sequenceTubeMap/exampleData/ and access them via "mounted files" in GUI.

Pushing Docker

At this point you'll need docker login with your own DockerHub credentials.

docker tag 4a2ed9251243 josiahseaman/sixref_server:latest
docker push josiahseaman/sixref_server:latest

docker run -p 3000:3000 -it josiahseaman/sixref_server bash
docker exec -it <name> bash
docker port <name>

Working Command from Scratch

docker run -p 2000:3000 --name=ivg_deploy josiahseaman/sixref_server:0.2 run_server The key was to make sure script was in /bin/ and had #!/bin/bash

Dockerrun.aws.json

Runs in one command with Elastic Beanstalk on AWS. Follow the Docker tutorial

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "josiahseaman/sixref_server:0.2",
    "Update": "true"
  },
  "Ports": [
    {
      "ContainerPort": "3000"
    }
  ],
  "Logging": "/var/log/nginx",
  "Entrypoint": "/bin/run_server"
}

Working demo is up and running at http://sixrefserver-env.kpsmbhtnk2.us-east-2.elasticbeanstalk.com/