gtoonstra / etl-with-airflow

ETL best practices with airflow, with examples
1.29k stars 272 forks source link

PostgresToPostgresOperator causing UnicodeEncodeError #5

Open thenaturalist opened 7 years ago

thenaturalist commented 7 years ago

Hi,

I'm failing to get your Operator to work due to what seems to be a mismatch of SQLalchemy and Psycopg2/Postgres. See this Stackoverflow Thread as well. Case is transfer between two PostgresDBs on Heroku.

Log excerpt from testing the Operator with a task:

[2017-06-23 14:43:23,814] {models.py:1342} INFO - Executing <Task(PostgresToPostgresOperator): test_transfer> on 2017-06-23 00:00:00
[2017-06-23 14:43:23,846] {postgres_to_postgres_operator.py:51} INFO - Executing: SELECT a,b,c FROM public.test LIMIT 1000;
[2017-06-23 14:43:23,846] {postgres_to_postgres_operator.py:55} INFO - Transferring Postgres query results into other Postgres database.
[2017-06-23 14:43:23,847] {base_hook.py:67} INFO - Using connection to: src_pg_conn_id
[2017-06-23 14:43:23,867] {postgres_to_postgres_operator.py:64} INFO - Inserting rows into Postgres
[2017-06-23 14:43:23,868] {base_hook.py:67} INFO - Using connection to: dest_pg_conn_id
[2017-06-23 14:43:23,886] {models.py:1417} ERROR - 'ascii' codec can't encode character u'\xfc' in position 177: ordinal not in range(128)
Traceback (most recent call last):
  File "/app/.heroku/miniconda/lib/python2.7/site-packages/airflow/models.py", line 1374, in run
    result = task_copy.execute(context=context)
  File "/app/plugins/postgres_to_postgres_operator.py", line 66, in execute
    dest_pg.insert_rows(table=self.pg_table, rows=cursor)
  File "/app/.heroku/miniconda/lib/python2.7/site-packages/airflow/hooks/dbapi_hook.py", line 220, in insert_rows
    ",".join(values))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 177: ordinal not in range(128)
[2017-06-23 14:43:23,887] {models.py:1441} INFO - Marking task as FAILED.

Did you come across this during development? Any tips how to make the operator work?

gtoonstra commented 7 years ago

Hi there,

So if you're running this on heroku, you're not running this in a docker container, which is the easiest way to get this running.

Airflow hasn't really been tested a whole lot against python3 too, are you running python3 by any chance?

I'd try specifying the client encoding in the airflow.cfg explicitly and from there evaluate if there are better ways to deal with this, for example setting the client encoding in the postgresql.conf file (which I think is a client config file).

It does seem as if it's defaulting to ascii. Unfortunately there's not a lot I can do from this end to figure this out.

I updated the project today to improve fact processing. It was using subselects, but now uses joins, which are 100 times more performant.

Can you let me know your progress with this issue?