jwills / target-duckdb

A Singer.io target for DuckDB
Other
17 stars 12 forks source link

Columns in source tables with only non-alphanumeric characters cannot be loaded to target #15

Closed matsonj closed 1 year ago

matsonj commented 1 year ago

Describe the bug Columns in source tables with non-alphanumeric characters cannot be loaded to target. Example: CSV column header "%" or "+/-". Interestingly, target-parquet handles this fine when using tap-spreadsheets-anywhere.

To Reproduce Steps to reproduce the behavior:

  1. add both tap-spreadsheets-anywhere and target-duckdb to your meltano project.
  2. configure a source in the tap, i.e. /path/to/file with the a CSV file with the following rows: "name","%","grade" "jacob","99%","A" "josh","65%","D"
  3. run meltano run tap-spreadsheets-anywhere target-duckdb
  4. you will get the following error:
    • Parser Error: zero-length delimited identifier at or near """"
    • this is because it tries to generate this table: CREATE TABLE IF NOT EXISTS . ("" varchar, "_smart_source_bucket" varchar, "_smart_source_file" varchar, "_smart_source_lineno" hugeint, "name" varchar, "grade" varchar)

Expected behavior Should allow creation of columns even with non-alphanumeric column names

matsonj commented 1 year ago

If you are looking to borrow solutions, target-parquet assigns the column name 'C0'. I'm looking for the code that controls this behavior, and I can't find it.

jwills commented 1 year ago

Ack I'm sorry I forgot about this bud, just been swamped lately and I don't have a team of folks to fix stuff for me on this repo like I do for dbt-duckdb :/

matsonj commented 1 year ago

All good. I don’t think this is critical path at the moment given parquet seems to be a better fit (for now).

jwills commented 1 year ago

Okay I figured out what was going wrong here and fixed it in that PR ^

matsonj commented 1 year ago

amazing thank you! will validate and close this weekend.

jwills commented 1 year ago

No worries, I need to merge and push the new release, will do that shortly

matsonj commented 1 year ago

bad news - this merge broke the loader entirely. It looks like something is going on with what it expects for the 'key'

Traceback (most recent call last): cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.056753Z [info ] File "/workspaces/fbb_analyzer/.meltano/loaders/target-duckdb/venv/bin/target-duckdb", line 5, in <module> cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.057332Z [info ] from target_duckdb import main cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.057740Z [info ] File "/workspaces/fbb_analyzer/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/__init__.py", line 19, in <module> cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.058197Z [info ] LOGGER = get_logger("target_duckdb") cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.058706Z [info ] File "/workspaces/fbb_analyzer/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/logger.py", line 16, in get_logger cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.059355Z [info ] logging.config.fileConfig(path, disable_existing_loggers=False) cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.059850Z [info ] File "/usr/local/lib/python3.9/logging/config.py", line 71, in fileConfig cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.060708Z [info ] formatters = _create_formatters(cp) cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.061243Z [info ] File "/usr/local/lib/python3.9/logging/config.py", line 104, in _create_formatters cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.061956Z [info ] flist = cp["formatters"]["keys"] cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.062556Z [info ] File "/usr/local/lib/python3.9/configparser.py", line 963, in __getitem__ cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.063057Z [info ] raise KeyError(key) cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.063641Z [info ] KeyError: 'formatters' cmd_type=elb consumer=True name=target-duckdb producer=False stdio=stderr string_id=target-duckdb 2022-10-31T04:28:23.141962Z [error ] Loader failed

jwills commented 1 year ago

Huh-- any change you have a LOGGING_CONF_FILE var set in your environment? https://github.com/jwills/target-duckdb/blob/main/target_duckdb/logger.py

Or maybe Meltano does?

matsonj commented 1 year ago

If one is set, its set by default somewhere. I'm going to rebuild the container. Parquet loader still works so its isolated to the duckdb loader (i.e. not a change to meltano in general)

jwills commented 1 year ago

yeah I remembered having that error before I added that logging.conf file to the PR, which fixed it; so I'm trying to grok how it came back

jwills commented 1 year ago

Ack, okay-- found and fixed the problem, published 0.4.3.