sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

Ruby logging with utf-16 and Docker expecting utf-8 #503

Closed jacobthill closed 2 months ago

jacobthill commented 2 months ago

Several UCLA collections are failing with

{logging_mixin.py:149} WARNING - UnicodeEncodeError: 'utf-8' codec can't encode character '\udcbf' in position 54: surrogates not allowed

https://dlme-airflow-dev.stanford.edu/log?dag_id=ucla_armenia_collections&task_id=armenia_collections_etl.transform_ucla_armenia_collections&execution_date=2024-06-16T07%3A00%3A00%2B00%3A00

It looks like Ruby might be logging with UTF-16 encoding and Docker (Python) is expecting UTF-8. This is causing encoding errors in the airflow UI when airflow tries to render the logs. It's not failing the tasks so its a minor issue but it can clog up the logs with a lot of error messages which can be confusing.

jacobthill commented 2 months ago

https://github.com/sul-dlss/dlme-transform/pull/1109 removed the logging from airflow and local terminal so we don't see the issue anymore though it may resurface in other logging down the road.