hammerlab / cytokit

Microscopy Image Cytometry Toolkit
Apache License 2.0
115 stars 18 forks source link

Document how to use cytokit on gcloud #8

Open armish opened 6 years ago

armish commented 6 years ago

Some preliminary notes:

Cytokit on gcloud

Spin up a machine on gcloud: 2 GPUs Nvidia K80

gcloud beta compute \
    --project=hammerlab-chs \
    instances create \
    cytokit \
    --zone=us-east1-c \
    --machine-type=n1-highmem-16 \
    --subnet=default \
    --network-tier=PREMIUM \
    --maintenance-policy=TERMINATE \
    --service-account=195534064580-compute@developer.gserviceaccount.com \
    --scopes="https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append" \
    --accelerator=type=nvidia-tesla-k80,count=2 \
    --tags=http-server,https-server \
    --image=ubuntu-1604-xenial-v20180627 \
    --image-project=ubuntu-os-cloud \
    --boot-disk-size=1000GB \
    --boot-disk-type=pd-standard \
    --boot-disk-device-name=cytokit

gcloud compute ssh --zone us-east1-c cytokit

Inspiration: https://medium.com/google-cloud/jupyter-tensorflow-nvidia-gpu-docker-google-compute-engine-4a146f085f17

# Install all the things as root to work around many issues
sudo su -

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda; then
  # The 16.04 installer works with 16.10.
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  apt-get update
  apt-get install cuda -y
fi

# Sanity check (should see all GPUs listed here)
nvidia-smi 
#/bin/bash
# install packages to allow apt to use a repository over HTTPS:
apt-get -y install \
apt-transport-https ca-certificates curl software-properties-common
# add Docker’s official GPG key:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - 
# set up the Docker stable repository.
add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
# update the apt package index:
apt-get -y update
# finally, install docker
apt-get -y install docker-ce
wget https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
dpkg -i nvidia-docker*.deb

# Sanity check (should run without any issues)
nvidia-docker run --rm nvidia/cuda nvidia-smi
exit # from root

# Sudoless docker setup:
sudo usermod -aG docker $USER
sudo systemctl restart docker

exit # logout completely
gcloud compute ssh ... # new login

# Sanity check (should run as a user)
docker run hello-world
nvidia-docker run --rm nvidia/cuda nvidia-smi

Setup is done. Let's pull in relevant programs and scripts:

cd $HOME
mkdir repos
cd repos
git clone https://github.com/hammerlab/cytokit.git && mv cytokit codex
git clone https://github.com/hammerlab/cvutils.git
git clone https://github.com/hammerlab/cell-image-analysis.git

cat << EOF >> ~/cytokit.env
export CODEX_DATA_DIR=$HOME/data
export CODEX_REPO_DIR=$HOME/repos/codex
export CVUTILS_REPO_DIR=$HOME/repos/cvutils
export CODEX_ANALYSIS_REPO_DIR=$HOME/repos/cell-image-analysis
EOF
source ~/cytokit.env

mkdir -p $CODEX_DATA_DIR
cd $CODEX_DATA_DIR
gsutil cp -r gs://musc-codex/models .

cd $CODEX_DATA_DIR
mkdir 20180614_D22_RepA_Tcell_CD4-CD8-DAPI_5by5
cd 20180614_D22_RepA_Tcell_CD4-CD8-DAPI_5by5
gsutil -m cp -r gs://musc-codex/datasets/20180614_D22_RepA_Tcell_CD4-CD8-DAPI_5by5 .
mv 20180614_D22_RepA_Tcell_CD4-CD8-DAPI_5by5 raw

We now have all we need (scripts/data). Let's run the analysis:

cd $CODEX_REPO_DIR/docker
nvidia-docker build -t codex-analysis -f Dockerfile.dev .

nvidia-docker run -ti -p 8888:8888 -p 6006:6006 -p 8787:8787 -p 8050:8050 --rm \
-v $CODEX_DATA_DIR:/lab/data \
-v $CODEX_REPO_DIR:/lab/repos/codex \
-v $CODEX_ANALYSIS_REPO_DIR:/lab/repos/codex-analysis \
-v $CVUTILS_REPO_DIR:/lab/repos/cvutils \
-e CODEX_CYTOMETRY_2D_MODEL_PATH=/lab/data/models/r0.3/nuclei_model.h5 \
-e CODEX_CACHE_DIR=/lab/data/.codex/cache \
codex-analysis

You can now connect to your notebook running on your gcloud instance using its public ID. Once on it, create a new console tab and run the following:

#!/usr/bin/env bash

EXP_NAME="20180614_D22_RepB_Tcell_CD4-CD8-DAPI_5by5"
CODEX_DATA_DIR=/lab/data
EXP_DIR=$CODEX_DATA_DIR/$EXP_NAME
CODEX_ANALYSIS_REPO_DIR=/lab/repos/codex-analysis/
EXP_CONF=$CODEX_ANALYSIS_REPO_DIR/config/experiment/$EXP_NAME/experiment.yaml
EXP_OUT=$EXP_DIR/output/v01

echo "Processing experiment $EXP_NAME"

cytokit processor run \
    --config-path=$EXP_CONF \
    --data-dir=$EXP_DIR/raw \
    --output-dir=$EXP_OUT \
    --run-drift-comp=False \
    --run-best-focus=True \
    --run-deconvolution=True \
    --gpus=[0,1] --py-log-level=info

cytokit operator \
extract \
  --config-path=$EXP_CONF \
  --data-dir=$EXP_OUT \
  --name='best_z_segm' \
  --channels=['proc_dapi','proc_cd4','proc_cd8','cyto_cell_boundary','cyto_nucleus_boundary'] - \
montage \
  --name='best_z_segm' \
  --extract-name='best_z_segm' 

cytokit analysis aggregate_cytometry_statistics \
  --config-path=$EXP_CONF \
  --data-dir=$EXP_OUT \
  --mode='best_z_plane'

Should be done in < 30 minutes. You can gsutil cp it to a bucket and turn your gcloud box off.

Need to clean this up a bit.