wagtail / wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases
https://wagtail-vector-index.readthedocs.io/en/latest/
MIT License
15 stars 10 forks source link

Revise developer documentation #39

Open zerolab opened 7 months ago

zerolab commented 7 months ago

From @nimasmi in https://github.com/wagtail/wagtail-ai/issues/43#issuecomment-1860660722

I had trouble running this at first. A lot of it stems from working with tox and flit. It's fine to use new tools, but in an agency that has an established stack and toolchain, expect to have to provide extra documentation for unfamiliar workflows.

Aside: I appreciate the current package architecture is not the fault of the current project team, and that it's a legacy of https://github.com/wagtail/cookiecutter-wagtail-package.

Setup:

Python 3.12

None of the tox tests run for me in the py312 environment. The issue I got was

...
File "/home/nick/projects/wagtail-vector-index/.tox/py312-django4.2-wagtail5.2-postgres/lib/python3.12/site-packages/pip/_internal/utils/misc.py", line 148, in rmtree
    shutil.rmtree(dir, onexc=handler)
TypeError: rmtree() got an unexpected keyword argument 'onexc'
py312-django4.2-wagtail5.2-postgres: exit 1 (29.08 seconds) /home/nick/projects/wagtail-vector-index> python -I -m pip install 'Django<4.3,>=4.2' 'psycopg2-binary>=2.9' 'wagtail<6.0,>=5.2' '.[pgvector]' '.[testing]' pid=3402450
  py311-django4.2-wagtail5.2-sqlite: OK (18.47=setup[0.08]+cmd[18.40] seconds)
  py311-django4.2-wagtail5.2-postgres: FAIL code 1 (7.05=setup[0.02]+cmd[7.03] seconds)
  py312-django4.2-wagtail5.2-sqlite: FAIL code 1 (29.66 seconds)
  py312-django4.2-wagtail5.2-postgres: FAIL code 1 (29.09 seconds)
  evaluation failed :( (84.40 seconds)
➜  wagtail-vector-index git:(main) ✗ 

but since 3.12 is in alpha, I sort of ignored this one.

Changing code, and running tests

Firstly flit needed to be installed with the -s option. This documentation change was also fixed in https://github.com/wagtail/wagtail-vector-index/pull/15.

When I modify the code, I need to run tox -e py311-django4.2-wagtail5.2-postgres -r (see note below). This takes 1 m 20 s, which is pretty tedious when you're checking a one-line change. If you are only modifying the tests, the -r (recreate) flag is not necessary, but even then the tests take 20 s. Is there a way to run tests more quickly?

Note: I stuck with the 3.11 environment because of the 3.12 failures, and the SQLite issue below, but also because one environment is quicker to test than four.

Databases and containers

I ran tests on PostgreSQL (due to the flit -s and tox -r issues, at first I believed the tests were skipped, or returning false positives, on SQLite). I don't have Postgres installed on my machine, and always run it in containers or VMs when necessary.

I started therefore trying to add a Dockerfile for this project. I got part way there, aiming for a Docker container to run the whole project, and contain multiple Pythons, so tox could do its thing inside that container. At that point I found out that the tox project recommends doing things differently: run a tox container, and let it manage the Python environments, I think. I'm not sure there how you would manage other dependencies such as a Postgres container, though.

Eventually @tm-kn pointed me at https://github.com/wagtail/wagtail-vector-index/issues/17, which was sufficient, so I didn't persist with getting the Dockerfiles working. I will post two Dockerfile examples in a separate update, just for discussion.

In general, though, there is so much configuration necessary that Docker might be handy, or at least we could do with better guidance for how to test locally.

Originally posted by @nimasmi in https://github.com/wagtail/wagtail-ai/issues/43#issuecomment-1860660722

zerolab commented 7 months ago

Follow up from @nimasmi in https://github.com/wagtail/wagtail-ai/issues/43#issuecomment-1860677558

Docker

Python Dockerfile

This is the Dockerfile written the way I naively approached it. It's a work in progress, but may save some time if we want to continue with this approach.

FROM python:3.11-bookworm

# Install dependencies in a virtualenv
ENV VIRTUAL_ENV=/venv

RUN useradd vector --create-home && mkdir /app $VIRTUAL_ENV && chown vector /app $VIRTUAL_ENV
WORKDIR /app
ENV PATH=$VIRTUAL_ENV/bin:$PATH \
    PYTHONUNBUFFERED=1
COPY . .
USER vector
RUN python -m venv $VIRTUAL_ENV
RUN pip install flit
RUN python -m flit install
RUN pip install tox
RUN tox

Tox project Dockerfile

On the other hand, the Tox project recommends starting from the 31z4/tox image. See Running within a Docker Container.

FROM 31z4/tox

USER root

RUN set -eux; \
    apt-get update; \
    DEBIAN_FRONTEND=noninteractive \
    apt-get install -y --no-install-recommends \
        python3.12 \
        postgresql; \
    rm -rf /var/lib/apt/lists/*

COPY --chown=tox . .
USER tox