medzin / beam-postgres

Light IO transforms for Postgres read/write in Apache Beam pipelines.
Apache License 2.0
12 stars 3 forks source link

Surprised dataclass params are coerced to tuple rather than dict #6

Closed MattOates closed 3 months ago

MattOates commented 3 months ago

So I have a code base making heavy use of SQLAlchemy and wanted to generate the insert statement from the ORM, and with SQLAlchemy 2.0 you can also have the models present as normal Python dataclasses. This was all working out super well until I realised your code internally assumes unnamed placeholders in the statement and always coerces to tuple. Surely it would be a lot safer and better for the dataclass case to use named placeholders? As well as support dicts too as its very common for people to use named placeholders rather than positional.

So the code at https://github.com/medzin/beam-postgres/blob/85bf582d3e7d8e37d9b719c37fb66b43e452202d/beam_postgres/io/postgres.py#L202

I'd suggest a change like:

def _execute_batch(self, batch: List[Any]) -> Optional[Tuple[Any, psycopg.Error]]:
        with self._pg_conn.cursor() as cur:
            for row in batch:
                if is_dataclass(row):
                    params = asdict(row)
                elif isinstance(row, tuple) or isinstance(row, dict):
                    params = row
                else:
                    raise TypeError("element must be a tuple, dict or dataclass")

                try:
                    cur.execute(self._statement, params, prepare=True)
                except psycopg.Error as err:
                    self._pg_conn.rollback()
                    return (row, err)
        self._pg_conn.commit()
        return None

Should I raise a PR around the above? Or do you want something like is_named passed through to decide if its a dict or tuple coerce? Either way I'd have to hard fork as Im dealing with a load of partition tables and magic from an ORM and having to deal with your code as is is super painful given I cant rely on statement generation or the types I have for the params.

medzin commented 3 months ago

Hi, @MattOates! The proposed code looks backward compatible, so there is no need to provide an extra flag to control the behavior. If you make a PR, I will merge it without any problems :)

medzin commented 3 months ago

Resolved in https://github.com/medzin/beam-postgres/commit/0b159767e6b1b693cf6419ef24f11cd6bd73e328

MattOates commented 2 months ago

@medzin only just noticing your response now, this is great thank you 🙏