toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.19k stars 182 forks source link

Sqlalchemy error #264

Open niraj-ads opened 2 years ago

niraj-ads commented 2 years ago

PGSync version: 2.2.1

Postgres version: 14 or 12

Elasticsearch version: 7.10.2 ( aws opensearch 1.1)

Redis version: 6.2.6

Python version: 3.7.12

Problem Description: when you setup --daemon, the sync breaks when you have a relationship with through_tables and upon inserting data in entry_author, it throws an error and the pgsync container dies.

The schema.json

{
        "database":"db",
        "index":"index1",
        "nodes":{
            "table":"entry",
            "columns":[
                "id",
                "price"
            ],
            "transform":{
                "mapping":{
                    "price":{
                        "type":"float"
                    }
                }
            },
            "children":[
                {
                    "table":"org",
                    "columns":[
                        "id"
                    ],
                    "relationship":{
                        "variant":"object",
                        "type":"one_to_one",
                        "foreign_key":{
                            "child":[
                                "id"
                            ],
                            "parent":[
                                "org_id"
                            ]
                        }
                    }
                },
                {
                    "table":"fruits",
                    "columns":[
                        "id",
                        "apple_id",
                        "city_id",
                        "region_id",
                        "country_id"
                    ],
                    "relationship":{
                        "variant":"object",
                        "type":"one_to_many",
                        "foreign_key":{
                            "child":[
                                "org_id"
                            ],
                            "parent":[
                                "org_id"
                            ]
                        }
                    },
                      "children":[
                        {
                            "table": "apple_city",
                            "columns":[
                                "name"
                            ],
                            "relationship": {
                                "variant": "object",
                                "type": "one_to_one",
                                "foreign_key": {
                                    "child":[
                                        "id"
                                    ],
                                    "parent":[
                                        "city_id"
                                    ]
                                }
                            }
                        },
                        {
                            "table": "apple_country",
                            "columns":[
                                "name",
                                "slug"
                            ],
                            "relationship": {
                                "variant": "object",
                                "type": "one_to_one",
                                "foreign_key": {
                                    "child":[
                                        "id"
                                    ],
                                    "parent":[
                                        "country_id"
                                    ]
                                }
                            }
                        },
                          {
                            "table": "apple_region",
                            "columns":[
                                "name"
                            ],
                            "relationship": {
                                "variant": "object",
                                "type": "one_to_one",
                                "foreign_key": {
                                    "child":[
                                        "id"
                                    ],
                                    "parent":[
                                        "region_id"
                                    ]
                                }
                            }
                        }
                    ]
                },
                {
                    "table":"test",
                    "columns":[
                        "id"
                    ],
                    "relationship":{
                        "variant":"object",
                        "type":"one_to_one",
                        "foreign_key":{
                            "child":[
                                "id"
                            ],
                            "parent":[
                                "parent_id"
                            ]
                        }
                    }
                },
                {
                    "table":"author",
                    "columns":[
                        "id",
                        "name"
                    ],
                    "relationship":{
                        "variant":"object",
                        "type":"one_to_many",
                        "through_tables":[
                            "entry_author_options"
                        ]
                    }
                }
            ]
        }
    }

Error Message (if any): image

ERROR:pgsync.elastichelper: Exception Select statement '<sqlalchemy.sql.selectable.Select object at 0x40447c0f50>' returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.
toluaina commented 2 years ago
andrewdw commented 2 years ago

I have the same issue. It's only present when "through_tables" is defined in my schema. When that relationship is removed everything functions as expected.

niraj-ads commented 2 years ago

@toluaina

So whenever I add a record in entry_author_options and retrieve that record, I get the sqlalchemy error and the pgsync container dies but when I start it again, it works correctly.

 class Entry(sampleInfoModel):
     profile = models.ForeignKey(to=Profile, on_delete=models.CASCADE)
     author_options = models.ManyToManyField( to="AuthorOptions", blank=True)
class AuthorOptions(sampleInfoModel):
    author = models.ForeignKey(to=Author, on_delete=models.CASCADE)
class Author(sampleInfoModel):
    profile = models.ForeignKey(to=Profile, on_delete=models.CASCADE)
toluaina commented 2 years ago

@andrewdw @niraj-ads

Do you by any chance have a complete schema or dump I can work with. I'm still unable to reproduce this. Ideally I would like a db dump schema only and pgsync schema.json. Feel free to send to me directly if you are worried about exposing anything sensitive

niraj-ads commented 2 years ago

@toluaina, are you using your toluaina@hotmail.com email? if yes, you should have received an email from me!

niraj-ads commented 2 years ago

@toluaina, any update regarding this?

toluaina commented 2 years ago

The bug report ended up being a little confusing. The schema here is different from what you sent to my email also I don't see the full trace.

Can you reproduce this with the Entry, AuthorOptions and Author model you described above?

niraj-ads commented 2 years ago

@toluaina, were you able to reproduce with the DB dump schema and pgsync schema.json I've sent you via email?

niraj-ads commented 2 years ago

@toluaina, If this trace can help

pgsync_1  | Syncing aaa Xlog: [184] => Db: [19] => Redis: [total = 19 pending = 0] => Elastic: [19,193] ...
pgsync_1  | Syncing aaa Xlog: [108] => Db: [19] => Redis: [total = 19 pending = 0] => Elastic: [5,800] ...
pgsync_1  | 2022-03-17 14:07:14.669:ERROR:pgsync.elastichelper: Exception Select statement '<sqlalchemy.sql.selectable.Select object at 0x7f381a95af90>' returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.
pgsync_1  | Traceback (most recent call last):
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/pgsync/elastichelper.py", line 117, in bulk
pgsync_1  |     raise_on_error=raise_on_error,
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/pgsync/elastichelper.py", line 185, in _bulk
pgsync_1  |     ignore_status=ignore_status,
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 484, in parallel_bulk
pgsync_1  |     actions, chunk_size, max_chunk_bytes, client.transport.serializer
pgsync_1  |   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
pgsync_1  |     raise value
pgsync_1  |   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
pgsync_1  |     result = (True, func(*args, **kwds))
pgsync_1  |   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 140, in _helper_reraises_exception
pgsync_1  |     raise ex
pgsync_1  |   File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 292, in _guarded_task_generation
pgsync_1  |     for i, x in enumerate(iterable):
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
pgsync_1  |     for action, data in actions:
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/pgsync/sync.py", line 788, in _payloads
pgsync_1  |     yield from self.sync(filters=filters, extra=extra)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/pgsync/sync.py", line 903, in sync
pgsync_1  |     count: int = self.fetchcount(node._subquery)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/pgsync/base.py", line 841, in fetchcount
pgsync_1  |     ).order_by(None)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
pgsync_1  |     return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 326, in _execute_on_connection
pgsync_1  |     self, multiparams, params, execution_options
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1479, in _execute_clauseelement
pgsync_1  |     linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 529, in _compile_w_cache
pgsync_1  |     **kw
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 558, in _compiler
pgsync_1  |     return dialect.statement_compiler(dialect, self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 776, in __init__
pgsync_1  |     Compiled.__init__(self, dialect, statement, **kwargs)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 451, in __init__
pgsync_1  |     self.string = self.process(self.statement, **compile_kwargs)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 486, in process
pgsync_1  |     return obj._compiler_dispatch(self, **kwargs)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3371, in visit_select
pgsync_1  |     kwargs,
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3514, in _compose_select_body
pgsync_1  |     for f in froms
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3514, in <listcomp>
pgsync_1  |     for f in froms
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3754, in visit_join
pgsync_1  |     + join.onclause._compiler_dispatch(
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3754, in visit_join
pgsync_1  |     + join.onclause._compiler_dispatch(
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2881, in visit_lateral
pgsync_1  |     return "LATERAL %s" % self.visit_alias(lateral_, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2837, in visit_alias
pgsync_1  |     self, asfrom=True, lateral=lateral, **kwargs
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2877, in visit_subquery
pgsync_1  |     return self.visit_alias(subquery, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2816, in visit_alias
pgsync_1  |     **kwargs
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3371, in visit_select
pgsync_1  |     kwargs,
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3514, in _compose_select_body
pgsync_1  |     for f in froms
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3514, in <listcomp>
pgsync_1  |     for f in froms
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3754, in visit_join
pgsync_1  |     + join.onclause._compiler_dispatch(
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2881, in visit_lateral
pgsync_1  |     return "LATERAL %s" % self.visit_alias(lateral_, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2837, in visit_alias
pgsync_1  |     self, asfrom=True, lateral=lateral, **kwargs
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2877, in visit_subquery
pgsync_1  |     return self.visit_alias(subquery, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2816, in visit_alias
pgsync_1  |     **kwargs
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
pgsync_1  |     return meth(self, **kw)
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3275, in visit_select
pgsync_1  |     select_stmt, compile_state, entry, asfrom, lateral, compound_index
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 3451, in _setup_select_stack
pgsync_1  |     implicit_correlate_froms=asfrom_froms,
pgsync_1  |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/selectable.py", line 4557, in _get_display_froms
pgsync_1  |     "manually." % self.statement
pgsync_1  | sqlalchemy.exc.InvalidRequestError: Select statement '<sqlalchemy.sql.selectable.Select object at 0x7f381a95af90>' returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.
current_pgsync_1 exited with code 255
toluaina commented 2 years ago

Sorry @niraj-ads I'm totally unable to reproduce this error. Might I suggest if you have discord you can hit me up taina#4505 and we can schedule a session to go through this

goktugbati commented 2 years ago

I opened https://github.com/toluaina/pgsync/issues/321 because I didn't see this one. It is the same issue and it works as expected when I remove through_tables in my schema

niraj-ads commented 2 years ago

@goktugbati, when you say it works as expected you mean you did this? for e.g from

          {
                    "table":"author",
                    "columns":[
                        "id",
                        "name"
                    ],
                    "relationship":{
                        "variant":"object",
                        "type":"one_to_many",
                        "through_tables":[
                            "entry_author_options"
                        ]
                    }
            }

to

  {
                    "table":"author",
                    "columns":[
                        "id",
                        "name"
                    ],
                    "relationship":{
                        "variant":"object",
                        "type":"one_to_many",
                    }
            }

and the data is inserted without any errors, or do you mean you removed the whole object?

goktugbati commented 2 years ago

@niraj-ads No, I mean when I removed the child that has through_tables relationship