pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Add some unicode tests #147

Closed windiana42 closed 5 months ago

windiana42 commented 5 months ago

This pull request is supposed to test special character handling with database targets. It is not comprehensive but the first tests using more than ASCII. It already turned out that Microsoft SQL Server is not handling some unicode characters such as λ (U+03BB) well.

The reason that unicode was found as a potential problem source turned out to be a problem in user code. In case you happen to load SQL files into pipedag, beware that Path(path).read_text() does not correctly read UTF-8 files on some operating systems. Explicitly specifying the encoding parameter is highly recommended. (see https://github.com/pydiverse/pydiverse.pipedag/blob/main/tests/test_flows/test_raw_sql_pipeline.py#L32)

Checklist