interactivereport / cellxgene_VIP

Enables cellxgene to generate violin, stacked violin, stacked bar, heatmap, volcano, embedding, dot, track, density, 2D density, sankey and dual-gene plot in high-resolution SVG/PNG format. It also performs differential gene expression analysis and provides a Command Line Interface (CLI) for advanced users to perform analysis using python and R.
https://cellxgenevip-ms.bxgenomics.com
MIT License
129 stars 44 forks source link

Contribution: Dockerfile for cellxgene VIP #102

Open Neah-Ko opened 8 months ago

Neah-Ko commented 8 months ago

Hello VIP team,

I have been tasked by my structure @bag-cnag to make VIP plugging work for cellxgene.

I am sharing the result here, and am offering to contribute it to the repo via a pull request: https://github.com/bag-cnag/cxg_on_k8/blob/main/docker/Dockerfile_cellxgene_VIP_slim

Notes:

Let me know what you think.

Best, Etienne

z5ouyang commented 8 months ago

Hi Etienne, Thank you very much for considering/doing this. Currently we don't have the bandwidth to do (or maintain) your proposal. but it is a good resource. We (@baohongz) could put your repo on the list if some user would like to use Docker.

Neah-Ko commented 8 months ago

Hello @z5ouyang , Sure, half the point was to make it visible in case someone is looking for such a Dockerfile.

Best,

rohitrrj commented 7 months ago

@Neah-Ko Thanks for sharing the Dockerfile! Looks great and seems a lot of effort kudos. However it looks like the build fails while installing the "rpy2" dependancy. Do you have any pointers on resolving this issue?

Neah-Ko commented 7 months ago

Hi @rohitrrj,

@Neah-Ko Thanks for sharing the Dockerfile! Looks great and seems a lot of effort kudos. However it looks like the build fails while installing the "rpy2" dependancy. Do you have any pointers on resolving this issue?

So yeah it's an issue I encountered while crafting the Dockerfile, most likely due to r2py not finding R install path. It wasn't occuring on my last runs however, I guess micromamba solver can be inconsistent with package ordering.

In principle you should set the R_HOME environment variable just above the environment creation part of the Dockerfile like this:

ENV R_HOME=/env/lib/R/

Could you please post a log of your failing build, so that I could make sure ?

Also, if you are building on MacOS, I know it can introduce some side effects. In that case, let me know your chip model as well as it can be important.

Best,

rohitrrj commented 7 months ago

@Neah-Ko Thanks for the suggestion. Unfortunately, that doesn't seem to solve the issue. The build still fails with the same error. I have attached my build log. I am building on MacOS with Intel chip. Following excerpt from the log file seems to be where the build breaks.

#25 379.9       In file included from build/temp.linux-x86_64-cpython-38/_rinterface_cffi_api.c:57:0:
#25 379.9       /env/include/python3.8/Python.h:44:10: fatal error: crypt.h: No such file or directory
#25 379.9        #include <crypt.h>
#25 379.9                 ^~~~~~~~~
#25 379.9       compilation terminated.
#25 379.9       error: command '/env/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
#25 379.9       [end of output]

cxgVIP_build.log

rohitrrj commented 7 months ago

@Neah-Ko I was able to resolve the above issue by following the documentation in rpy2 repo described here. Along with your suggestion above adding the following seems to have resolved it.

RUN export LD_PATH=$(python -m rpy2.situation LD_LIBRARY_PATH)
ENV LD_LIBRARY_PATH=$LD_PATH:${LD_LIBRARY_PATH}

The build does finish without errors. Most of the functions seem to work as expected. Only exception was the Single Gene Violin plot which does not seem to populate the actual plot, although the Get Data does seem to export the underlying matrix.

mohammed-hussain1259 commented 7 months ago

Hi @Neah-Ko

Thank you so much for all your work on this, I was just wondering if in this docker container you can build cellxgene using the custom tiledb_version of cellxgene you built also.

Neah-Ko commented 7 months ago

Hi @Neah-Ko

Thank you so much for all your work on this, I was just wondering if in this docker container you can build cellxgene using the custom tiledb_version of cellxgene you built also.

Hi @mohammed-hussain1259,

Yeah so I've tried to craft a dockerfile to get both CXG VIP AND the TileDb backend. It is possible to build such an image, however, if TileDb backend is used, then it breaks VIP functionalities.

For a very simple reason: VIP codebase is retrieving data by referencing the AnnData object. I invite you to check out this createData function that performs the job.

https://github.com/interactivereport/cellxgene_VIP/blob/2a524cdf585287f5d3554f507119b20a11fe8342/VIPInterface.py#L195

From that it means that to have a unified product we would need to either:

bobermayer commented 5 months ago

hi, thanks a lot for sharing. this looks great, however, I'm unable to build the docker image (on ubuntu 22) even including the additional lines suggested by @rohitrrj. I'm still getting the same error whne trying to build rpy2. any suggestions greatly appreciated!

Neah-Ko commented 5 months ago

hi, thanks a lot for sharing. this looks great, however, I'm unable to build the docker image (on ubuntu 22) even including the additional lines suggested by @rohitrrj. I'm still getting the same error whne trying to build rpy2. any suggestions greatly appreciated!

Hi @bobermayer ,

Here's what you could try:

  1. Copy the conda-env VIP_cnag.yml file from my gist onto your local machine
  2. Remove rpy2 from this file, save it, change Dockerfile to use your file instead
  3. The following snippet is the one managing env creation:

https://github.com/bag-cnag/cxg_on_k8/blob/f4d66f50f7a8bc8eedc48a0a909cac1e12ca6b31/docker/Dockerfile_cellxgene_VIP_slim#L118-L123

Since you took out rpy2 from the env file, you want to install it manually in the env after it is created. Append the following line like this:

RUN micromamba env create ...
    ...
    python3 -m pip install --no-deps /cellxgene*.whl && \
    [export R_HOME=/env/lib/R/ && \]
    python3 -m pip install rpy2==3.3.5

I'm not sure, the R_HOME line is necessary but you may try both versions.

Let me know if that worked.

Best,

bobermayer commented 5 months ago

Hi @Neah-Ko

thanks for your message. none of that worked, but I found a workaround by explicitly installing libcrypt-dev and copying crypt.h to the expected location (see https://github.com/stanford-futuredata/ColBERT/issues/309).

Dockerfile ``` ARG PYTHON__V=3.8 FROM mambaorg/micromamba:1.5.6-bookworm-slim as base USER root ENV LC_ALL=C.UTF-8 ENV LANG=C.UTF-8 ARG PYTHON__V # ------------------------------------------------------------------------------ FROM base AS builder ENV LLVM_CONFIG=/usr/lib/llvm14/bin/llvm-config # Build dependencies RUN apt-get update && \ apt-get -y install bash && \ apt-get -y install build-essential && \ apt-get -y install jq && \ apt-get -y install git && \ apt-get -y install libhdf5-dev && \ apt-get -y install python3-pkgconfig && \ apt-get -y install python3-dev && \ apt-get -y install python3-pip && \ apt-get -y install python3-wheel && \ apt-get -y install llvm-dev && \ apt-get -y install libblas-dev && \ apt-get -y install cpio WORKDIR / RUN mkdir cellxgene cellxgene_VIP # Copies a single commit: lighter and fixes the version WORKDIR /cellxgene_VIP RUN git init && \ git remote add origin https://github.com/interactivereport/cellxgene_VIP.git && \ git fetch --depth 1 origin 6d4e496b94701e742d99fa0a0f0362ebea82814b && \ git checkout FETCH_HEAD WORKDIR /cellxgene RUN git init && \ git remote add origin https://github.com/chanzuckerberg/cellxgene.git && \ git fetch --depth 1 origin ffcf6eb5d842972f2562c359cc2276a0fbbe77d5 && \ git checkout FETCH_HEAD # Applying cellxgene fixes: # - Upgrade: Flask, boto, s3fs, fssepec, numpy # - limit Werkzeug version as new update (3.0.0) breaks server # - np.bool deprecated since numpy 1.20 -> replace by bool # - Replace Flask.json.JSONEncoder by json.JSONEncoder in utils.py # - Sets (s3) region name to false in default_config.py # - Add --legacy-peer-deps and --openssl-legacy-provider flags to npm commands in makefiles # - Extra Makefile entry to build a wheel RUN cp ./environment.default.json /environment.default.json RUN sed -i 's/np.bool/bool/g' server/data_common/data_adaptor.py && \ printf "\nWerkzeug<=2.3.7" >> server/requirements.txt && \ sed -i '/^boto3>/ s/=.*/=1.27.47/' server/requirements.txt && \ sed -i '/^anndata/ s/==.*$/==0.9.2/' server/requirements.txt && \ sed -i '/^Flask>/ s/,.*$/,<3.0.0/' server/requirements.txt && \ sed -i '/^numpy>/ s/=.*$/=1.24.4/' server/requirements.txt && \ sed -i '/^fsspec>/ s/,.*$//' server/requirements.txt && \ sed -i '/^s3fs==/ s/==.*$/==2023.9.0/' server/requirements.txt && \ sed -i '10s/^/from json import JSONEncoder\n/' server/common/utils/utils.py && \ sed -i 's/json.JSONEncoder/JSONEncoder/g' server/common/utils/utils.py && \ sed -i '/region_name/ s/:.*$/: false/' server/default_config.py && \ sed -i 's/npm ci/npm ci --legacy-peer-deps/' client/Makefile && \ sed -i '6s/^/WHEELBUILD := $(BUILDDIR)\/lib\/server\n/' Makefile && \ printf '\n\ build_wheel: build \n\ $(call copy_client_assets,$(CLIENTBUILD),$(WHEELBUILD)) \n\ pywheel: \n\ NODE_OPTIONS=--openssl-legacy-provider $(MAKE) build_wheel \n\ python3 setup.py bdist_wheel -d wheel\n' >> Makefile RUN cp /cellxgene_VIP/index_template.insert ./index_template.insert # Patch from cellxgene_VIP/config.sh: update cellxgene client source code for VIP RUN echo -e "\nwindow.store = store;" >> client/src/reducers/index.js && \ sed -i "s|
|$(sed -e 's/[&\\/]/\\&/g; s/|/\\|/g; s/$/\\/;' -e '$s/\\$//' index_template.insert)\n&|" client/index_template.html && \ sed -i "s|logoRelatedPadding = 50|logoRelatedPadding = 60|" client/src/components/leftSidebar/index.js && \ sed -i "s|title=\"cellxgene\"|title=\"cellxgene VIP\"|" client/src/components/app.js && \ sed -i "s|const *scaleMax *= *[0-9\.]\+|const scaleMax = 50000|; s|const *scaleMin *= *[0-9\.]\+|const scaleMin = 0.1|; s|const *panBound *= *[0-9\.]\+|const panBound = 80|" client/src/util/camera.js && \ printf '\n\ from server.app.VIPInterface import route\n\ @webbp.route("/VIP", methods=["POST"])\n\ def VIP():\n\ return route(request.data, current_app.app_config)\n' >> server/app/app.py && \ sed -i '/^-e/d' ./client/src/reducers/index.js ## # Build cellxgene wheel in node env for next stage RUN micromamba create -yn node18 'nodejs>=18,<19' -c conda-forge && \ micromamba run -n "node18" \ make pywheel # ------------------------------------------------------------------------------ FROM base AS final ARG PYTHON__V # Get wheel and VIP sources COPY --from=builder /cellxgene/wheel/cellxgene*.whl / COPY --from=builder /cellxgene_VIP /cellxgene_VIP COPY --from=builder /cellxgene/test/decode_fbs.py /cellxgene/test/decode_fbs.py # Conda runs with bash SHELL ["/bin/bash", "-c"] WORKDIR /tmp # Get env file RUN apt-get update && \ apt-get install -y --no-install-recommends wget && \ wget https://gist.githubusercontent.com/Neah-Ko/d260316d77a42c5e7a698a766d8404a0/raw/6196bd8342350d01452500541151fd7e81e66443/VIP_cnag.yml # remove rpy2 from the yml file RUN cat VIP_cnag.yml | grep -v rpy2 > VIP_cnag_no_rpy2.yml # Create env and install cellxgene and ipykernel in it RUN micromamba env create -p /env -y --file VIP_cnag_no_rpy2.yml && \ eval "$(micromamba shell hook --shell bash)" && \ micromamba activate -p /env && \ python3 -m ipykernel install --display-name "Python (/env)" --sys-prefix && \ python3 -m pip install --no-deps /cellxgene*.whl # install rpy2 separately and first hack crypt.h into the expected location (see https://github.com/stanford-futuredata/ColBERT/issues/309) RUN apt-get update && \ apt-get -y install libcrypt-dev && \ cp /usr/include/crypt.h /env/include/crypt.h && \ eval "$(micromamba shell hook --shell bash)" && \ micromamba activate -p /env && \ python3 -m pip install rpy2==3.3.5 ENV PYTHONPATH=/env/lib/python${PYTHON__V}/site-packages ENV APPPATH=${PYTHONPATH}/server/app # Patch from cellxgene_VIP/update.VIPInterface.sh WORKDIR /cellxgene_VIP RUN mkdir ${APPPATH}/gsea && \ sed -i "s|MAX_LAYOUTS *= *[0-9]\+|MAX_LAYOUTS = 300|" ${PYTHONPATH}/server/common/constants.py && \ # To display notebook results: sed -i 's| $("#CLIresize").html(filteredRes);| $("#CLIresize").html(filteredRes + res);|' ./interface.html && \ cp ./interface.html ${PYTHONPATH}/server/common/web/static/ && \ cp ./gsea/*.gmt ${APPPATH}/gsea/ && \ cp ./VIPInterface.py ${APPPATH} && \ cp ./fgsea.R ${APPPATH} && \ cp ./complexHeatmap.R ${APPPATH} && \ cp ./volcano.R ${APPPATH} && \ cp ./Density2D.R ${APPPATH} && \ cp ./bubbleMap.R ${APPPATH} && \ cp ./bubbleMap.R ${APPPATH} && \ cp ./violin.R ${APPPATH} && \ cp ./volcano.R ${APPPATH} && \ cp ./browserPlot.R ${APPPATH} && \ cp ./complexHeatmap.R ${APPPATH} && \ cp ./proteinatlas_protein_class.csv ${APPPATH} && \ cp ./complex_vlnplot_multiple.R ${APPPATH} && \ cp /cellxgene/test/decode_fbs.py ${APPPATH} ## # Some R packages need to be installed from sources RUN apt-get update && \ apt-get install -y --no-install-recommends libfreetype6-dev libharfbuzz-dev libfribidi-dev libpng-dev libtiff5-dev libjpeg-dev xfonts-base && \ ln -s /usr/include/freetype2/freetype /env/include/freetype && \ ln -s /usr/include/freetype2/ft2build.h /env/include/ft2build.h && \ eval "$(micromamba shell hook --shell bash)" && \ micromamba activate -p /env && \ R -q -e 'if(!require(ggrastr)) \ devtools::install_version("ggrastr", version="0.2.1", upgrade=FALSE, repos = c("https://packagemanager.posit.co/cran/__linux__/bookworm/latest/", "http://cran.us.r-project.org"))' && \ R -q -e 'if(!require(hexbin)) \ devtools::install_version("hexbin", version="1.28.2", upgrade=FALSE, repos = c("https://packagemanager.posit.co/cran/__linux__/bookworm/latest/", "http://cran.us.r-project.org"))' && \ R -q -e 'if(!require(dbplyr)) \ devtools::install_version("dbplyr", version="1.0.2", upgrade=FALSE, repos = c("https://packagemanager.posit.co/cran/__linux__/bookworm/latest/", "http://cran.us.r-project.org"))' && \ apt-get remove -y libfreetype6-dev libharfbuzz-dev libfribidi-dev libpng-dev libtiff5-dev libjpeg-dev && \ apt-get -y autoremove && \ micromamba clean --all --yes # Clean env from now unecessary stuff RUN find /env -name '*.a' | xargs rm -rf && \ find /env -type d -name '__pycache__' | xargs rm -rf && \ find /env -type d -name 'tests' -not -path *site-packages/tables* | xargs rm -rf && \ find /env -name 'x86_64-conda*' | xargs rm -rf && \ rm -rf /env/share/doc /env/share/gtk-doc /env/conda-meta /env/compiler_compat && \ rm -rf /env/etc/conda /env/lib/gcc /env/lib/cmake /env/lib/ldscripts # ------------------------------------------------------------------------------ # Needs a shell FROM debian:bookworm-slim ARG PYTHON__V # Keep only the env & drop intermediate layers COPY --from=final /env /env # Set syspaths ENV PYTHONPATH=/env/lib/python${PYTHON__V}/site-packages ENV PATH /env/bin:$PATH # Needed at runtime RUN apt-get update && \ apt-get install -y --no-install-recommends xfonts-base && \ apt-get clean && \ rm -rf /var/cache/apt/* /var/cache/debconf/* /var/lib/apt/lists/* # Add user: cellxgeneuser, -> gives ownership over /data ARG UID=1000 ARG GID=1000 RUN mkdir /data && \ addgroup --gid "${GID}" cellxgeneuser && \ adduser --no-create-home \ --disabled-password \ --uid "${UID}" --gid "${GID}" \ cellxgeneuser && \ chown -R cellxgeneuser:cellxgeneuser /data # Ensures that users have permissions over /tmp USER root RUN chmod 1777 /tmp USER cellxgeneuser # Sets temporary directories for (numba | matplotlib) ENV NUMBA_CACHE_DIR=/tmp ENV MPLCONFIGDIR=/tmp ENTRYPOINT ["/env/bin/cellxgene"] CMD ["launch", "--help"] ```
Neah-Ko commented 5 months ago

HI @bobermayer, Thanks for your input.

Interestingly, the built also failed with missing crypt.h on my machine. I guessed something somewhere changed since I've designed the Dockerfile.

I have added libxcrypt=4.4.36 in the conda env file and updated the Dockerfile to pull latest version. It seem to have fixed the issue.

Best,