pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Create helm chart to make it easier to setup Linux cloud VM for testing pipedag with docker #87

Open windiana42 opened 1 year ago

windiana42 commented 1 year ago

Something like this just worked on Debian 11: (MSSQL driver is probably still missing and pytests should run)

sudo apt-get install docker-compose git
sudo apt-get remove docker-compose  # looks strange but leaves a docker substitute that works
# consider adding a special user with `adduser <username>` (don't use useradd)
sudo usermod -a -G docker $USER
# log out of machine and log back in so new groups will take effect (docker)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod a+x Miniconda3-latest-Linux-x86_64.sh 
./Miniconda3-latest-Linux-x86_64.sh 
# typical answers including running conda init=yes
bash
conda install mamba -c conda-forge
mamba create -n poetry -c conda-forge poetry compilers cmake make psycopg2 docker-compose
mkdir code
cd code
git clone https://github.com/pydiverse/pydiverse.pipedag
cd pydiverse.pipedag/
docker run -h db2server --name db2server --restart=always --detach --privileged=true -p 50000:50000 --env-file docker_db2.env_list -v /Docker:/database ibmcom/db2
conda activate poetry
nohup docker-compose up &
poetry install
cd example
export POSTGRES_USERNAME=sa
export POSTGRES_PASSWORD=Pydiverse23
poetry run python run_pipeline.py

Install mssql driver:

sudo sh -c 'curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list'
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EB3E94ADBE1229CF
sudo apt-get update
ACCEPT_EULA=Y sudo apt-get install -y msodbcsql18
windiana42 commented 1 year ago

These commands are a hack to run pipedag pytest by a second user on the same machine (don't commit the changes!):

for i in $(find -iname "pipedag*.yaml" -or -name "test_local_table_cache.py"); do sed -i 's|/tmp/pipedag/|/tmp/pipedag_mt/|g' "$i"; done
for i in $(find -name "test_lock_manager.py"); do sed -i 's|/ "pipedag" /|/ "pipedag_mt" /|g' "$i"; done