Closed matsonj closed 1 year ago
example tap config:
plugins:
extractors:
- name: tap-spreadsheets-anywhere
variant: ets
pip_url: git+https://github.com/ets/tap-spreadsheets-anywhere.git
config:
tables:
- path: file:///path/to/file
name: test_load
pattern: .*
start_date: '2001-01-01T00:00:00Z'
key_properties: [key]
After running 'meltano elt tap-spreadsheets-anywhere target-duckdb' it will have 8 rows (should have 4). In my experience with identical config, this same behavior does not replicate with the postgres target. But I have not tried this explicitly.
it is possible that this has to do with "key" enforced as required in target-duckdb. target-postgres will accept a blank key_property i.e.
key_properties: []
What happens if you give target-duckdb an empty key_properties
? I bet this is related to this limitation https://github.com/jwills/target-duckdb/blob/main/target_duckdb/db_sync.py#L426 wherein I cannot modify primary key columns even if the modification is a no-op.
Okay I think I see how to fix this but it'll be a little tricky to do it
Snippet here Primary key is set to mandatory but not defined in the [test_load] stream
full error below
2022-10-02T19:25:42.623374Z [info ] Running extract & load... name=meltano run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb
2022-10-02T19:25:42.785014Z [warning ] No state was found, complete import.
2022-10-02T19:25:44.914039Z [info ] INFO Using supplied catalog /home/jacob/meltano-projects/nba-monte-carlo/.meltano/run/elt/2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb/eb647e6b-b171-4f97-8986-b36e250bd1db/tap.properties.json. cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.915002Z [info ] INFO Processing 3 selected streams from Catalog cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.915567Z [info ] INFO Syncing stream:test_load cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.916169Z [info ] INFO Walking /mnt/c/Users/matso/OneDrive/Documents/test_data/test. cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.918400Z [info ] time=2022-10-02 12:25:44 name=target_duckdb level=CRITICAL message=Primary key is set to mandatory but not defined in the [test_load] stream cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.919461Z [info ] Traceback (most recent call last): cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.920324Z [info ] File "/home/jacob/meltano-projects/nba-monte-carlo/.meltano/loaders/target-duckdb/venv/bin/target-duckdb", line 8, in <module> cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.921014Z [info ] sys.exit(main()) cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.921725Z [info ] File "/home/jacob/meltano-projects/nba-monte-carlo/.meltano/loaders/target-duckdb/venv/lib/python3.8/site-packages/target_duckdb/__init__.py", line 424, in main cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.922548Z [info ] persist_lines(connection, config, singer_messages) cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.923898Z [info ] File "/home/jacob/meltano-projects/nba-monte-carlo/.meltano/loaders/target-duckdb/venv/lib/python3.8/site-packages/target_duckdb/__init__.py", line 279, in persist_lines cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.925466Z [info ] raise Exception("key_properties field is required") cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.926490Z [info ] Exception: key_properties field is required cmd_type=loader name=target-duckdb run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.928226Z [info ] INFO Found 1 files. cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.942420Z [info ] INFO Checking 1 resolved objects for any that match regular expression ".*" and were modified since 2001-01-01 00:00:00+00:00 cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.943156Z [info ] INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details. cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.943833Z [info ] INFO Syncing file "test_load.csv". cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.960331Z [info ] INFO Wrote 4 records for stream "test_load". cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.961139Z [info ] INFO Syncing stream:schedule cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.962463Z [info ] INFO Walking /mnt/c/Users/matso/OneDrive/Documents/test_data/schedule. cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.975121Z [info ] INFO Found 1 files. cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.994512Z [info ] INFO Checking 1 resolved objects for any that match regular expression ".*" and were modified since 2001-01-01 00:00:00+00:00 cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.995376Z [info ] INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details. cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:44.996258Z [info ] INFO Syncing file "nba_schedule_2023.csv". cmd_type=extractor name=tap-spreadsheets-anywhere run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:45.021431Z [error ] Loading failed code=1 message=Exception: key_properties field is required name=meltano run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb
2022-10-02T19:25:45.022673Z [info ] ELT could not be completed: Loader failed. cmd_type=elt name=meltano run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:45.023162Z [info ] For more detailed log messages re-run the command using 'meltano --log-level=debug ...' CLI flag. cmd_type=elt name=meltano run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:45.024320Z [info ] Note that you can also check the generated log file at '/home/jacob/meltano-projects/nba-monte-carlo/.meltano/logs/elt/2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb/eb647e6b-b171-4f97-8986-b36e250bd1db/elt.log'. cmd_type=elt name=meltano run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
2022-10-02T19:25:45.025489Z [info ] For more information on debugging and logging: https://docs.meltano.com/reference/command-line-interface#debugging cmd_type=elt name=meltano run_id=eb647e6b-b171-4f97-8986-b36e250bd1db state_id=2022-10-02T192539--tap-spreadsheets-anywhere--target-duckdb stdio=stderr
Ah okay- so it sounds like the upstream requires a key
Err, it's really this, b/c the ingest is config'd to be incremental: https://github.com/jwills/target-duckdb/blob/main/target_duckdb/__init__.py#L265
Okay I think I have a quick fix here, PR coming shortly
Describe the bug When you execute 'meltano elt SOURCE target-duckdb', instead of only detecting incremental changes. It appears this is because "_smart_source_lineno" is getting populated with null.
To Reproduce Steps to reproduce the behavior:
Expected behavior only new records are added on step 3 when you run elt again. also, the column list should not change (meltano is running ALTER table to modify column names even when the source is unchanged in order to handle 'duplicate columns' for _smart_source_lineno".
Screenshots
Your environment