toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.16k stars 180 forks source link

pgsync container getting exit due to invalid schema.json #579

Open hrik01 opened 2 hours ago

hrik01 commented 2 hours ago

PGSync version: latest

Postgres version: 16

Elasticsearch/OpenSearch version: 2.17.1

Redis version: 7.4.0

Python version: 3.10.12

Problem Description: i am creating a setup on my machine locally , and i have made the containers of PGSync, opensearch, redis, postgres under the same network and updated the runserver.sh file acc to my hostand host ports but my pgsync container is getting exited again and again showing the logs that schema.json not valid where i have made it validate from different online jason validator sites

Error Message (if any): wait-for-it.sh: waiting 60 seconds for strapiDB:5432 wait-for-it.sh: strapiDB:5432 is available after 0 seconds PostgreSQL is up wait-for-it.sh: waiting 60 seconds for 192.168.1.2:9400 wait-for-it.sh: 192.168.1.2:9400 is available after 0 seconds OpenSearch is up wait-for-it.sh: waiting 60 seconds for redis:6379 wait-for-it.sh: redis:6379 is available after 0 seconds Redis is up Error: ./schema.json is not valid JSON. Exiting. wait-for-it.sh: waiting 60 seconds for strapiDB:5432 wait-for-it.sh: strapiDB:5432 is available after 0 seconds PostgreSQL is up wait-for-it.sh: waiting 60 seconds for 192.168.1.2:9400 wait-for-it.sh: 192.168.1.2:9400 is available after 0 seconds OpenSearch is up wait-for-it.sh: waiting 60 seconds for redis:6379 wait-for-it.sh: redis:6379 is available after 0 seconds Redis is up Error: ./schema.json is not valid JSON. Exiting.

I am providing the commands that i used till yet for the setup.

Setup:

Go to the github repository of PGSync and download the files of pgsync for reference.

Run the following commands to setup Strapi, Postgres, PGSync, Opensearch, Redis:

Commands :

Create Network for the container for easy communication $ sudo docker network create Strapi

Create Strapi container $ sudo docker run -d \ --name strapi \ --network Strapi \ -p 1337:1337 \ -e HOST=0.0.0.0 \ -e PORT=1337 \ -e APP_KEYS=8a45kNmvx0DsPv5CrdJH595+1Snn444SqEuq3mTQ844=,X8U6WyFa2kC+2P+I79F2QSy82xGG+HwirI5iF6wXYZk= \ -e API_TOKEN_SALT=AbsqpmZ171iyl9394ZbYCg== \ -e ADMIN_JWT_SECRET=4UXCd43bus3S0vwbEIa4sA== \ -e TRANSFER_TOKEN_SALT=suRgGs3Kpn4+3POvtwsIFQ== \ -e DATABASE_CLIENT=postgres \ -e DATABASE_HOST=strapiDB \ -e DATABASE_PORT=5432 \ -e DATABASE_NAME=npi \ -e DATABASE_USERNAME=pgsync \ -e DATABASE_PASSWORD=redhat \ -e DATABASE_SSL=false \ -e DATABASE_FILENAME=.tmp/data.db \ -e JWT_SECRET=PJi80mJC2xfK6qRH81dSyg== \ -e NODE_ENV=production \ -v /home/hriritk/strapi/my-project/package.json:/opt/package.json:rw \ -v /home/hriritk/strapi/my-project/config:/opt/app/config:rw \ -v /home/hriritk/strapi/my-project/src:/opt/app/src:rw \ -v /home/hriritk/strapi/my-project/public/uploads:/opt/app/public/uploads:rw \ -v /home/hriritk/strapi/my-project/yarn.lock:/opt/yarn.lock:rw \ -w /opt/app \ strapi:latest \ yarn develop Create Postgres container $ docker run -d \ --name strapiDB \ --network Strapi \ --hostname postgres \ -p 5432:5432 \ -v /home/hriritk/strapi/postgres-data:/var/lib/postgresql/data:rw \ -e POSTGRES_USER=pgsync \ -e POSTGRES_PASSWORD=redhat \ -e POSTGRES_DB=npi \ Postgres:latest

Create container of Redis $ sudo docker run -d \ --name redis \ --network Strapi \ redis \ redis-server --requirepass redhat

Create container of Opensearch $ sudo docker run -d \ --name opensearch \ --network Strapi \ -p 9400:9200 -p 9600:9600 \ -e discovery.type=single-node \ -e bootstrap.memory_lock=true \ -e DISABLE_INSTALL_DEMO_CONFIG=true \ -e DISABLE_SECURITY_PLUGIN=true \ opensearchproject/opensearch:latest

Create container of PGSync Create Dockerfile for the image of PGSync

Base image

FROM python:3.9

Define the working directory

ARG WORKDIR=/code RUN mkdir -p $WORKDIR WORKDIR $WORKDIR

Install PGSync from the GitHub repository

RUN pip install git+https://github.com/toluaina/pgsync.git

Copy the wait-for-it script (for waiting on dependencies like PostgreSQL)

COPY ./wait-for-it.sh wait-for-it.sh

COPY ./schema.json schema.json

Make sure the wait-for-it script is executable

RUN chmod +x wait-for-it.sh

Add the runserver script (your custom script to start the pgsync server)

COPY ./runserver.sh runserver.sh

Make sure the runserver script is executable

RUN chmod +x runserver.sh

Expose any required ports (if applicable)

EXPOSE 5000

Command to run the server script (it will wait for PostgreSQL and OpenSearch)

CMD ["./runserver.sh"]

Before building the image of PGSync we need to make 2 scripts for pgsync container that are going to be copied in the container of postgresql

Runserver.sh $ vim runserver.sh

!/bin/sh

set -u

Wait for PostgreSQL to be ready

./wait-for-it.sh strapiDB:5432 -t 60 --strict -- echo "PostgreSQL is up"

Wait for OpenSearch to be ready

./wait-for-it.sh 192.168.1.8:9400 -t 60 --strict -- echo "OpenSearch is up"

Wait for Redis to be ready

./wait-for-it.sh my-project_redis_1:6379 -t 60 --strict -- echo "Redis is up"

Define the configuration file path for pgsync

CONFIG_FILE="./schema.json"

./schema.json

Run schema setup or any other initialization (if needed)

Ensure your schema setup is already handled. If not, include commands here.

If schema files like schema.py or schema.json are not relevant, these can be omitted.

Run PGSync with the configuration file

pgsync --config $CONFIG_FILE --daemon

Optional: Any other bootstrap tasks or custom logic can be added here.

!/bin/sh

!/bin/sh

set -u

Wait for PostgreSQL to be ready

./wait-for-it.sh strapiDB:5432 -t 60 --strict -- echo "PostgreSQL is up"

Wait for OpenSearch to be ready

./wait-for-it.sh 192.168.1.8:9400 -t 60 --strict -- echo "OpenSearch is up"

Wait for Redis to be ready

./wait-for-it.sh my-project_redis_1:6379 -t 60 --strict -- echo "Redis is up"

Define the configuration file path for pgsync

CONFIG_FILE="./schema.json"

Check if the schema.json file exists

if [ ! -f "$CONFIG_FILE" ]; then echo "Error: $CONFIG_FILE not found. Exiting." exit 1 fi

Validate schema.json is valid JSON format

if ! jq empty "$CONFIG_FILE" > /dev/null 2>&1; then echo "Error: $CONFIG_FILE is not valid JSON. Exiting." exit 1 fi

Run PGSync with the configuration file

pgsync --config "$CONFIG_FILE" --daemon

Optional: Any other bootstrap tasks or custom logic can be added here.

wait-for-it.sh $ vim wait-for-it.sh

!/usr/bin/env bash

Use this script to test if a given TCP host/port are available

cmdname=$(basename $0)

echoerr() { if [[ $QUIET -ne 1 ]]; then echo "$@" 1>&2; fi }

usage() { cat << USAGE >&2 Usage: $cmdname host:port [-s] [-t timeout] [-- command args] -h HOST | --host=HOST Host or IP under test -p PORT | --port=PORT TCP port under test Alternatively, you specify the host and port as host:port -s | --strict Only execute subcommand if the test succeeds -q | --quiet Don't output any status messages -t TIMEOUT | --timeout=TIMEOUT Timeout in seconds, zero for no timeout -- COMMAND ARGS Execute command with args after the test finishes USAGE exit 1 }

wait_for() { if [[ $TIMEOUT -gt 0 ]]; then echoerr "$cmdname: waiting $TIMEOUT seconds for $HOST:$PORT" else echoerr "$cmdname: waiting for $HOST:$PORT without a timeout" fi start_ts=$(date +%s) while : do if [[ $ISBUSY -eq 1 ]]; then nc -z $HOST $PORT result=$? else (echo > /dev/tcp/$HOST/$PORT) >/dev/null 2>&1 result=$? fi if [[ $result -eq 0 ]]; then end_ts=$(date +%s) echoerr "$cmdname: $HOST:$PORT is available after $((end_ts - start_ts)) seconds" break fi sleep 1 done return $result }

wait_for_wrapper() { if [[ $QUIET -eq 1 ]]; then timeout $BUSYTIMEFLAG $TIMEOUT $0 --quiet --child --host=$HOST --port=$PORT --timeout=$TIMEOUT & else timeout $BUSYTIMEFLAG $TIMEOUT $0 --child --host=$HOST --port=$PORT --timeout=$TIMEOUT & fi PID=$! trap "kill -INT -$PID" INT wait $PID RESULT=$? if [[ $RESULT -ne 0 ]]; then echoerr "$cmdname: timeout occurred after waiting $TIMEOUT seconds for $HOST:$PORT" fi return $RESULT }

process arguments

while [[ $# -gt 0 ]] do case "$1" in : ) hostport=(${1//:/ }) HOST=${hostport[0]} PORT=${hostport[1]} shift 1 ;; --child) CHILD=1 shift 1 ;; -q | --quiet) QUIET=1 shift 1 ;; -s | --strict) STRICT=1 shift 1 ;; -h) HOST="$2" if [[ $HOST == "" ]]; then break; fi shift 2 ;; --host=) HOST="${1#=}" shift 1 ;; -p) PORT="$2" if [[ $PORT == "" ]]; then break; fi shift 2 ;; --port=) PORT="${1#=}" shift 1 ;; -t) TIMEOUT="$2" if [[ $TIMEOUT == "" ]]; then break; fi shift 2 ;; --timeout=) TIMEOUT="${1#=}" shift 1 ;; --) shift CLI=("$@") break ;; --help) usage ;; *) echoerr "Unknown argument: $1" usage ;; esac done

if [[ "$HOST" == "" || "$PORT" == "" ]]; then echoerr "Error: you need to provide a host and port to test." usage fi

TIMEOUT=${TIMEOUT:-15} STRICT=${STRICT:-0} CHILD=${CHILD:-0} QUIET=${QUIET:-0}

check to see if timeout is from busybox?

TIMEOUT_PATH=$(realpath $(which timeout)) if [[ $TIMEOUT_PATH =~ "busybox" ]]; then ISBUSY=1 BUSYTIMEFLAG="-t" else ISBUSY=0 BUSYTIMEFLAG="" fi

if [[ $CHILD -gt 0 ]]; then wait_for RESULT=$? exit $RESULT else if [[ $TIMEOUT -gt 0 ]]; then wait_for_wrapper RESULT=$? else wait_for RESULT=$? fi fi

if [[ $CLI != "" ]]; then if [[ $RESULT -ne 0 && $STRICT -eq 1 ]]; then echoerr "$cmdname: strict mode, refusing to execute subprocess" exit $RESULT fi exec "${CLI[@]}" else exit $RESULT fi

Create schema.json in the same directory that dockerfile is going to copy it in the container $ vim schema.json [ { "database": "npi", "index": "calendar_holidays_index", "nodes": { "table": "calendar_holidays", "schema": "public", "columns": [ "id", "review_email", "alias", "title", "date", "holiday_type", "review", "created_at", "updated_at", "published_at", "created_by_id", "updated_by_id", "day", "keywords", "calendar_year", "weight" ], "indexes": [ { "name": "calendar_holidays_pkey", "columns": ["id"], "type": "btree", "primary": true }, { "name": "calendar_holidays_created_by_id_fk", "columns": ["created_by_id"], "type": "btree" }, { "name": "calendar_holidays_updated_by_id_fk", "columns": ["updated_by_id"], "type": "btree" } ], "children": [] } } ]

Built the image of PGSync $ sudo docker built -t pgsync-image .

Create the container of pgsync using the image we created $ sudo docker run -d \ --name pgsync \ --network Strapi \ --sysctl net.ipv4.tcp_keepalive_time=200 \ --sysctl net.ipv4.tcp_keepalive_intvl=200 \ --sysctl net.ipv4.tcp_keepalive_probes=5 \ -e PG_USER=pgsync \ -e PG_HOST=strapiDB \ -e PG_PORT=5432 \ -e PG_PASSWORD=redhat \ -e LOG_LEVEL=INFO \ -e ELASTICSEARCH_PORT=9400 \ -e ELASTICSEARCH_SCHEME=http \ -e ELASTICSEARCH_HOST=opensearch \ -e REDIS_HOST=redis \ -e REDIS_PORT=6379 \ -e REDIS_AUTH=redhat \ -e OPENSEARCH=true \ -p 5000:5000 \ pgsync-image ./runserver.sh

After creating the containers go into the Postgresql Container and navigate to the Postgres.conf file and make some changes in the file.

$ sudo docker exec -it StrapiDB /bin/bash $ cd /var/lib/postgres/ $ vim postgres.config

Enable logical decoding in postgres.conf wal_level = logical

Ensure there is at least one replication slot defined in postgres.conf max_replication_slots = 1

Add pgdump into the postgres for syncing data

Download the all_data.sql dump file into your machine

Import that dump file to postgres container $ docker cp /home/hriritk/Downloads/alldata.sql strapiDB:/alldata.sql

Restore the alldata dump to the npi database: After copying the file $ docker exec -it strapiDB psql -U pgsync -d npi -f /all_data.sql Move the calender_holidays table from strapi_test to npi: First, dump the calender_holidays table from the strapi_test database:

$ docker exec -it strapiDB pg_dump -U pgsync -d strapi_test -t calender_holidays > calender_holidays.sql

Then, restore it into the npi database:

$ docker exec -i strapiDB psql -U pgsync -d npi < calender_holidays.sql

* facing an error in creating a pgsync container as it is getting exited again and again due to invalid schema.json. **

hrik01 commented 2 hours ago

Please help as this is a very urgent requirement !!!