Closed srpwnd closed 6 months ago
I can also edit this to make the line terminator characters configurable by user in both csv.writer
and COPY
so that everyone can chose between '\r\n'
, '\n'
, and '\r'
based on their use case.
Ah this is great Pavel, thank you! Tests are failing b/c MD doesn't yet support 0.10.0, taking a look
tested this locally and verified it worked-- thank you @srpwnd !
@jwills Thanks for quickly testing and merging this! 🙏
Will this be reflected in the PyPi/Meltano catalog straight away or do you need to make a release first?
I think I need to make a release first, though of course you can use it right away by pointing at the Github main branch from Meltano
Problem
Data with certain new line characters throws
duckdb.duckdb.InvalidInputException: Invalid Input Error: Wrong NewLine Identifier. Expecting \r\n
as described in #32.Proposed changes
The Pythons
csv.writer
by default sets it's parameter oflineterminator
to'\r\n'
when writing the CSV files. This is not correctly reflected in DuckDB when trying to load these CSV files as it automatically assumes the new line character by contents of the file which is incorrect under certain circumstances. Setting thenew_line
argument in DuckDBsCSV COPY
manually prevents this wrong assumption.Types of changes
What types of changes does your code introduce to PipelineWise?