toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.11k stars 174 forks source link

Pgsync tries to sync a removed column #405

Open keniobats opened 1 year ago

keniobats commented 1 year ago

PGSync version: 2.3.3

Postgres version: 13-master(postgis)

Elasticsearch version: 8.4

Redis version: 7.0.5

Python version: 3.7.15

Problem Description: Removed pgsync and redis container, image too. Did image rebuild and tried full re-index and pgsync takes an old column from another table even if my schema.json has a totally new and different table.

Error Message (if any):

wait-for-it.sh: waiting 60 seconds for database:5432
wait-for-it.sh: database:5432 is available after 0 seconds
wait-for-it.sh: waiting 60 seconds for es01:9200
wait-for-it.sh: es01:9200 is available after 46 seconds
wait-for-it.sh: waiting 60 seconds for redis:6379
wait-for-it.sh: redis:6379 is available after 0 seconds
/usr/local/lib/python3.7/site-packages/pgsync/base.py:175: SAWarning: Did not recognize type 'geometry' of column 'ubicacion'
  metadata.reflect(self.engine, views=True)
 - public.narcomenudeo_denuncias
Traceback (most recent call last):
  File "/usr/local/bin/bootstrap", line 69, in <module>
    main()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/bin/bootstrap", line 59, in main
    document, verbose=verbose, repl_slots=False, **kwargs
  File "/usr/local/lib/python3.7/site-packages/pgsync/singleton.py", line 18, in __call__
    *args, **kwargs
  File "/usr/local/lib/python3.7/site-packages/pgsync/sync.py", line 101, in __init__
    self.validate(repl_slots=repl_slots)
  File "/usr/local/lib/python3.7/site-packages/pgsync/sync.py", line 181, in validate
    f"Required materialized view columns not present on "
RuntimeError: Required materialized view columns not present on _view. Please re-run bootstrap.
/usr/local/lib/python3.7/site-packages/pgsync/base.py:175: SAWarning: Did not recognize type 'geometry' of column 'ubicacion'
  metadata.reflect(self.engine, views=True)
 0:00:01.988392 (1.99 sec)
Traceback (most recent call last):
  File "/usr/local/bin/pgsync", line 7, in <module>
    sync.main()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pgsync/sync.py", line 1385, in main
    sync: Sync = Sync(document, verbose=verbose, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pgsync/singleton.py", line 18, in __call__
    *args, **kwargs
  File "/usr/local/lib/python3.7/site-packages/pgsync/sync.py", line 101, in __init__
    self.validate(repl_slots=repl_slots)
  File "/usr/local/lib/python3.7/site-packages/pgsync/sync.py", line 154, in validate
    f'Replication slot "{self.__name}" does not exist.\n'
RuntimeError: Replication slot "bigeye_bigeye_narcomenudeo" does not exist.
Make sure you have run the "bootstrap" command.

schema.json:

[   
    {
        "database": "bigeye",
        "index": "bigeye_narcomenudeo",
        "nodes": {
            "table": "narcomenudeo_denuncias",
            "transform": {
                "concat": {
                    "columns": ["lat", "lon"],
                    "destination": "lugar",
                    "delimiter": ","
                },
                "mapping": {
                    "lugar": {
                        "type" : "geo_point"
                    },
                    "lat": {
                        "type":"text"
                    },
                    "lon": {
                        "type":"text"
                    },
                    "relato": {
                        "type":"text"
                    },
                    "armas": {
                        "type":"keyword"
                    },
                    "direccion": {
                        "type":"text"
                    },
                    "fechaSuceso": {
                        "type":"text"
                    },
                    "menorEnRiesgo": {
                        "type":"keyword"
                    },
                    "nroCaso": {
                        "type":"text"
                    },
                    "nroReferencia": {
                        "type":"text"
                    },
                    "proviene911": {
                        "type":"keyword"
                    },
                    "regional": {
                        "type":"keyword"
                    },
                    "violenciaDomesticaGenero": {
                        "type":"keyword"
                    },
                    "averiguacionPrevia": {
                        "type":"keyword"
                    },
                    "tipoDenuncia": {
                        "type":"keyword"
                    }
                }
            }
        }
    }

]
toluaina commented 1 year ago

If you remove a column from your schema, you should probably perform a full re-index because the underlying structure of your Elasticsearch has changed and ES is quite particular about this. This is in the main documentation as a caveat.

toluaina commented 1 year ago

Hoping to close this if you feel its been addressed.

toluaina commented 1 year ago

Closing as resolved. Please feel free to reopen if you feel otherwise,