turbot / steampipe-plugin-csv

Use SQL to instantly query data from CSV files. Open source CLI. No DB required.
https://hub.steampipe.io/plugins/turbot/csv
Apache License 2.0
19 stars 4 forks source link

UTF-8 encoded CSVs can't leverage Column A #38

Closed jchrisfarris closed 1 year ago

jchrisfarris commented 1 year ago

Describe the bug CSV files encoded in UTF-8 begin with hex EF BB BF. The CSV Plugin doesn't parse/ignore these three bites making the first column not queryable. CSV files created by MS-Excel are in UTF-8.

Two sample files run through the MacOS xxd hex viewer:

chris$ xxd bad.csv  | head -2
00000000: efbb bf63 6f6e 7472 6f6c 2c70 6574 5f6e  ...control,pet_n
00000010: 616d 652c 6578 7065 6374 6564 5f72 6573  ame,expected_res

Note the three dots and the efbb bf first three bytes above. A steampipe parsable CSV looks like:

chris$ xxd good.csv  | head -2
00000000: 636f 6e74 726f 6c2c 7065 745f 6e61 6d65  control,pet_name
00000010: 2c65 7870 6563 7465 645f 7265 7375 6c74  ,expected_result

Steampipe version (steampipe -v) steampipe version 0.16.4

Plugin version (steampipe plugin list)

steampipe plugin list | grep csv
| hub.steampipe.io/plugins/turbot/csv@latest    | 0.4.0   | csv         |

To reproduce See attached files for examples:

> select count(control) from bad;
Error: column "control" does not exist (SQLSTATE 42703)
> select count(control) from good;
+-------+
| count |
+-------+
| 7     |
+-------+
> select count(pet_name) from bad;
+-------+
| count |
+-------+
| 7     |
+-------+
> select count(pet_name) from good;
+-------+
| count |
+-------+
| 7     |
+-------+
>

Expected behavior CSV plugin should ignore the first three bytes of a CSV File in UTF-8