jwills / target-duckdb

A Singer.io target for DuckDB
Other
17 stars 12 forks source link

fix: set line terminator chars to correspond with csv writer #35

Closed srpwnd closed 6 months ago

srpwnd commented 6 months ago

Problem

Data with certain new line characters throws duckdb.duckdb.InvalidInputException: Invalid Input Error: Wrong NewLine Identifier. Expecting \r\n as described in #32.

Proposed changes

The Pythons csv.writer by default sets it's parameter of lineterminator to '\r\n' when writing the CSV files. This is not correctly reflected in DuckDB when trying to load these CSV files as it automatically assumes the new line character by contents of the file which is incorrect under certain circumstances. Setting the new_line argument in DuckDBs CSV COPY manually prevents this wrong assumption.

Types of changes

What types of changes does your code introduce to PipelineWise?

srpwnd commented 6 months ago

I can also edit this to make the line terminator characters configurable by user in both csv.writer and COPY so that everyone can chose between '\r\n', '\n', and '\r' based on their use case.

jwills commented 6 months ago

Ah this is great Pavel, thank you! Tests are failing b/c MD doesn't yet support 0.10.0, taking a look

jwills commented 6 months ago

tested this locally and verified it worked-- thank you @srpwnd !

srpwnd commented 6 months ago

@jwills Thanks for quickly testing and merging this! 🙏

Will this be reflected in the PyPi/Meltano catalog straight away or do you need to make a release first?

jwills commented 6 months ago

I think I need to make a release first, though of course you can use it right away by pointing at the Github main branch from Meltano