transferwise / pipelinewise-tap-postgres

Singer.io Tap for PostgreSQL - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
GNU Affero General Public License v3.0
41 stars 101 forks source link

Shouldn't fail with ERROR "could not find pg_class entry for ..." after dropping table of unrelated schema #151

Open atsu85 opened 2 years ago

atsu85 commented 2 years ago

Describe the bug When DB contains two schemas (for example "public" and "pglogical"), but only one of them is configured for tap-postgres:

type: "tap-postgres"
schemas:
  - source_schema: "public"
...
    tables:
      - table_name: "test_table"
        replication_method: "LOG_BASED"

and if after last PipelineWise run other schema (pglogical) was deleted, then next PipelineWise run will fail with:

logger_name=tap_postgres file=/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/sync_strategies/logical_replication.py:619 log_level=ERROR message=could not find pg_class entry for 7222930

logger_name=tap_postgres file=/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/__init__.py:434 log_level=CRITICAL message=could not find pg_class entry for 7222930

Traceback (most recent call last):
  File "/app/.virtualenvs/tap-postgres/bin/tap-postgres", line 8, in <module>
    sys.exit(main())
  File "/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/__init__.py", line 435, in main
    raise exc
  File "/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/__init__.py", line 432, in main
    main_impl()
  File "/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/__init__.py", line 421, in main_impl
    do_sync(conn_config, args.catalog.to_dict() if args.catalog else args.properties,
  File "/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/__init__.py", line 322, in do_sync
    state = sync_logical_streams(conn_config, list(streams), state, end_lsn, state_file)
  File "/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/__init__.py", line 221, in sync_logical_streams
    state = logical_replication.sync_tables(conn_config, logical_streams, state, end_lsn, state_file)
  File "/app/.virtualenvs/tap-postgres/lib/python3.8/site-packages/tap_postgres/sync_strategies/logical_replication.py", line 617, in sync_tables
    msg = cur.read_message()
psycopg2.errors.InternalError_: could not find pg_class entry for 7222930

Which is caught and "rethrown"/raised from this line.

The missing pg_class entry pointed to table from the deleted schema.

To Reproduce Steps to reproduce the behavior:

  1. Prepare database with two schemas (at least one table in both of them)
  2. Include only a table from one of those schemas in tap-postgres configuration (with LOG_BASED replication_method).
  3. Start pipelinewise - it should succeed
  4. Drop the other schema
  5. Start pipelinewise again - it should fail with the error mentioned above and

    TAP RUN SUMMARY

    Status : FAILED

Expected behavior

Dropping tables from schemas not configured for PipelineWise shouldn't cause issues, and tap run should succeed despite of deleted table:

TAP RUN SUMMARY

Status : SUCCESS

Your environment

atsu85 commented 2 years ago

i guess it could be related to replication slot not being active when the table (or whole schema) is dropped