jwills / target-duckdb

A Singer.io target for DuckDB
Other
17 stars 12 forks source link

Handle newlines in temp csv file #30

Closed jaypeedevlin closed 10 months ago

jaypeedevlin commented 10 months ago

Describe the bug The writing of data to the temp CSV file doesn't seem resilient to text fields containing line breaks

I am getting the error

duckdb.duckdb.InvalidInputException: Invalid Input Error: CSV File not supported for multithreading. This can be a problematic line in your CSV File or that this CSV can't be read in Parallel. Please, inspect if the line 4260 is correct. If so, please run single-threaded CSV Reading by setting parallel=false in the read_csv call.

when I check the temp csv file, I see something like this:

{""type"": ""unlimitedDate"", ""date"": ""2023-04-28T17:16:22-0400""}]","Collecting X-MEN: FIRST CLASS #6-10.
<br>Rated A ...$14.99
<br>ISBN: 978-0-7851-2599-0
<br>",SEP082458,41635,9780785 125990 51499,

Expected behavior Newlines are \n in the file instead of literal new lines.

Your environment

jaypeedevlin commented 10 months ago

I'm now not entirely convinced that the issue is these line breaks, but instead something else with the raw data, I'll continue to dig in but closing this for now.

swerbo commented 9 months ago

@jaypeedevlin did you ever discover what the issue is? I am running into this as well and it is either line breaks or perhaps something with JSON and empty data.

jaypeedevlin commented 9 months ago

@swerbo I can't remember why but I became convinced that the problem was with odd source data and not the handling of newlines in the source data. I skirted around the original issue as it wasn't material to the work I was doing. Sorry I couldn't be of more help.