rilldata / rill

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
https://www.rilldata.com
Apache License 2.0
1.63k stars 111 forks source link

Improve error messages to include file name having ingestion issues #5411

Closed nishantmonu51 closed 1 week ago

nishantmonu51 commented 1 month ago

If a malformed CSV file gets added to a directory, it can fail data ingestion. In such case the error currently doesn't include the exact file name causing issues. Add the corrupt file name to the error returned. Optionally also add the ability to skip corrupted files as well.

begelundmuller commented 3 weeks ago

If this is caused by DuckDB not showing the file name, we should consider just raising the issue in their issue tracker instead.

k-anshul commented 3 weeks ago

I see the error message from duckDB is very helpful on main and not on 1.0.0 Error msg from main :

Conversion Error: CSV Error on Line: 1670846
Original Line: B00310,HELLO,2022-07-20 07:02:30,,242.0,,B03404
Error when converting column "pickup_datetime". Could not convert string "HELLO" to 'TIMESTAMP'

Column pickup_datetime is being converted as type TIMESTAMP
This type was auto-detected from the CSV file.
Possible solutions:
* Override the type for this column manually by setting the type explicitly, e.g. types={'pickup_datetime': 'VARCHAR'}
* Set the sample size to a larger value to enable the auto-detection to scan more values, e.g. sample_size=-1
* Use a COPY statement to automatically derive types from an existing table.

  file=data_22.csv
  delimiter = , (Auto-Detected)
  quote = " (Auto-Detected)
  escape = " (Auto-Detected)
  new_line = \n (Auto-Detected)
  header = true (Auto-Detected)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding=0
  sample_size=20480
  ignore_errors=false
  all_varchar=0

I will pick this once duckdb 1.1.0 is release which is scheduled to release on 2024-09-02

k-anshul commented 1 week ago

Nothing to be done on this from our side. This will already be part of error messages. Sample below : However the file name is trimmed.

image
k-anshul commented 1 week ago

The file name being trimmed will be handled in a separate issue in https://github.com/rilldata/rill/issues/5604