observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
957 stars 83 forks source link

DuckDB to import columns as strings as a fallback #341

Closed libbey-observable closed 1 year ago

libbey-observable commented 1 year ago

Partially resolves https://github.com/observablehq/observablehq/issues/9857

The two cases it resolves are:

  1. @Fil's case with thousands of rows of "F", which DuckDB interpreted as boolean, then threw an error when it encountered an "M."
  2. Allison's case of ejecting to SQL and it silently failing due to a type mismatch.

Case 2 (and likely case 1) occurred when the mismatch was found in a row > 10240, as that's the (default?) sample size DuckDB checks when inferring types.

Now, if insertCSVFromPath fails, we catch it, check whether it failed due to a conversion error, and if so, try again with all columns as strings. Only CSV and TSV files are affected by this change.

The error for case 1 (before): Screen Shot 2023-01-11 at 4 21 40 PM Case 1 fixed (after):
Screen Shot 2023-01-11 at 4 21 55 PM

Video showing before and after for case 2:

https://user-images.githubusercontent.com/111310561/211949879-3281b4bf-047a-4866-a65c-358e1814eb44.mov

libbey-observable commented 1 year ago

Regarding the check for "Could not convert", if it's not complicated I'd rather have it, not so much for performance but to clarify the code path.

I agree, and added a check for "Could not convert." It does feel a bit brittle to add a check for a hard-coded string, but the clarity it adds seems worth it.

libbey-observable commented 1 year ago

@mbostock It's much simpler now, no need for any config/options anywhere. Thanks for your feedback!