papermerge / papermerge-core

In this repository is the source code of Papermerge DMS backend core, REST API server, and frontend UI
https://papermerge.com
Apache License 2.0
298 stars 54 forks source link

Not an Issue but a general question #138

Closed arminzaugg closed 1 year ago

arminzaugg commented 1 year ago

Dear @ciur and community

On my journey to better understand containers I am trying to deploy the papermerge-core image on a serverless service, namly Cloud Run on GCP. Also this would be the most cost efficient solution for me personally.

My understanding is that it is technically possible to run papermerge on Cloud Run when using their second generation execution envirionment. I am able to run the papermerge-core image on Cloud Run without issues.

However, to make this work for good we need persistent storage and a database. I am now stuck at the storage part when trying to integrate gcsfuse. I did the following.

  1. Clone the papermerge repo
  2. Adjust the Docker file so that gcsfuse is installed and a folder to mount the storage to is created.
FROM papermerge/base:1.1.1 as build

### STEP 1 - pull all python dependencies in virtual env
ENV IN_DOCKER=1
ENV POETRY_VIRTUALENVS_CREATE=false
ENV UWSGI_PROFILE=gevent
ENV VIRTUAL_ENV=/venv

RUN apt-get install -y --no-install-recommends \
    build-essential \
    python3-dev \
    tesseract-ocr \
    tesseract-ocr-deu \
    imagemagick \
    gcc

RUN pip install --upgrade poetry
RUN python -m venv /venv

ENV PATH="/venv/bin:$PATH"

COPY poetry.lock pyproject.toml /
RUN poetry install --no-root --no-dev -vvv

## STEP 2 - use slim base image
FROM python:3.10-slim

ENV PATH="/venv/bin:$PATH"
ENV PYTHONBUFFERED=1
ENV VIRTUAL_ENV=/venv

RUN apt-get update;
RUN apt-get install -y --no-install-recommends \
    build-essential \
    python3-dev \
    postgresql-client \
    tesseract-ocr \
    tesseract-ocr-deu \
    imagemagick \
    poppler-utils \
    git \
    libmagic1 \
    ghostscript \
    file \
    gcc \ 
    wget

# install gcsfuse
RUN wget "https://github.com/GoogleCloudPlatform/gcsfuse/releases/download/v0.42.4/gcsfuse_0.42.4_amd64.deb" && \
    chmod +x gcsfuse_0.42.4_amd64.deb && \
    apt install ./gcsfuse_0.42.4_amd64.deb -y

# create dir for gcsfuse mount
RUN mkdir -p /gcs

COPY docker/prod/uwsgi.ini /etc/uwsgi/papermerge.ini
COPY docker/prod/scripts /
RUN chmod +x /run.bash

COPY --from=build /venv /venv

WORKDIR app

# sources
COPY papermerge/ ./papermerge/
COPY docker/prod/config/ ./config/
COPY docker/prod/manage.py ./

EXPOSE 8000

ENTRYPOINT ["/run.bash"]
CMD ["server"]
  1. Adjust the run.bash startup script so that gcsfuse is executed at init so that storage is mounted
    
    #!/bin/bash

export PATH="/venv/bin:${PATH}"

CMD="$1" PYTHON="/venv/bin/python" MANAGE="${PYTHON} manage.py" BUCKET="pm-6cf6b091c755272a74369f8a2a90b796" MNT_DIR="/gcs"

if [ -z "${DJANGO_SETTINGS_MODULE}" ]; then

default value for DJANGO_SETTINGS_MODULE environment variable

export DJANGO_SETTINGS_MODULE=config.settings fi

if [ -z "${DJANGO_SUPERUSER_USERNAME}" ]; then

default value for DJANGO_SUPERUSER_USERNAME environment variable

export DJANGO_SUPERUSER_USERNAME=admin fi

if [ -z "${DJANGO_SUPERUSER_EMAIL}" ]; then

default value for DJANGO_SUPERUSER_EMAIL environment variable

export DJANGO_SUPERUSER_EMAIL=admin@example.com fi

if [ -z $CMD ]; then echo "No command specified" exit 1 fi

exec_server() { exec uwsgi --ini /etc/uwsgi/papermerge.ini }

exec_ws_server() { exec daphne -b 0.0.0.0 --port 8000 config.asgi:application }

exec_collectstatic() { $MANAGE collectstatic --noinput }

exec_migrate() {

run migrations

$MANAGE migrate --no-input }

exec_update_index() {

Create/Update search index

$MANAGE update_index & }

exec_createsuperuser() {

user envrironment variables:

(1) DJANGO_SUPERUSER_USERNAME

(2) DJANGO_SUPERUSER_EMAIL

(3) DJANGO_SUPERUSER_PASSWORD

to create superuser if (1) and (2) are set

if [ -n "${DJANGO_SUPERUSER_USERNAME}" ] && [ -n "${DJANGO_SUPERUSER_EMAIL}" ]; then echo "Creating superuser username=${DJANGO_SUPERUSER_USERNAME}" $MANAGE createsuperuser --noinput \ --username ${DJANGO_SUPERUSER_USERNAME} \ --email ${DJANGO_SUPERUSER_EMAIL} || true fi }

exec_worker() { exec celery --app config worker \ -n "worker-node-${HOSTNAME}@papermerge" ${PAPERMERGEWORKERARGS} }

exec_gcsfuse() { echo "Mounting GCS Fuse." gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR echo "Mounting completed." }

exec_init() { exec_collectstatic exec_migrate exec_createsuperuser exec_update_index exec_gcsfuse }

case $CMD in init) exec_init ;; migrate) exec_migrate ;; collectstatic) exec_collectstatic ;; createsuperuser) exec_createsuperuser ;; server)

starts REST API webserver

exec_init
exec_server
;;

ws_server)

start websockets server

exec_init
exec_ws_server
;;

worker) exec_worker ;; *) $MANAGE $@ ;; esac



The mount of the storage seems to work fine, but when building this image and deploying it, I am not able to access the app. When comparing the logs I always receive a SIGTERM event before the uWSGI worker comes up on port 7000 as in the working logs using the standard image.
Snipped from working example using the ready made image.
![image](https://user-images.githubusercontent.com/5075572/234204048-b75912e2-2097-4191-8653-1e888dabbaea.png)

Snipped from the log output when using my image with gcsfuse. Please find a log export [here](https://drive.google.com/file/d/1BmV5JB4bbSYmurID0IkFgqgQUhT-yVtj/view?usp=sharing)
![image](https://user-images.githubusercontent.com/5075572/234195151-3e159054-9429-417f-bcdb-42a07bc6aa39.png)

I would highly appreciate any feedback which points me in the right direction so that I can resolve this.
arminzaugg commented 1 year ago

Moved into Discussions. Issue closed.