jwills / target-duckdb

A Singer.io target for DuckDB
Other
17 stars 12 forks source link

Double primary key from tap-mysql causes a temp table loading failure #36

Open shippy opened 6 months ago

shippy commented 6 months ago

Describe the bug I'm not 100% sure what is happening here. A table in tap-mysql has two columns, each of which is a primary key. target-duckdb creates a corresponding table with two primary keys, successfully creates a temp table with the full contents in main schema, then fails when copying it into the target schema.

The failure is duckdb.duckdb.ParserException: Parser Error: syntax error at or near "FROM", but I'm unable to get at the exact expression being executed. The full traceback follows:

Removing the primary keys from the MySQL table of origin is an effective workaround for the issue, which makes me think that the primary keys are to blame.

To Reproduce Steps to reproduce the behavior:

  1. Prepare the data for mysql5.7 database:
DROP TABLE IF EXISTS `soutez_teze`;
CREATE TABLE `soutez_teze` (
  `soutez_ID` int(10) unsigned NOT NULL DEFAULT 0,
  `teze_ID` int(10) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`soutez_ID`,`teze_ID`),
  KEY `soutez_ID` (`soutez_ID`),
  KEY `teze_ID` (`teze_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;
INSERT INTO `soutez_teze` (`soutez_ID`, `teze_ID`) VALUES
(1, 4),
(1, 5),
(2, 6);
  1. Set up standard target-duckdb deployment.
  2. Run the command run tap-mysql target-duckdb
  3. See error

Expected behavior A table is successfully replicated in DuckDB, with or without primary keys.

Screenshots If applicable, add screenshots to help explain your problem.

Your environment

Additional context

meltano-1  | 2024-05-01T04:04:31.935825Z [info     ] time=2024-05-01 04:04:31 name=target_duckdb level=INFO message=Loading 222 rows into 'adk_wrapped_full.debatovanicz."soutez_teze"' cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.938590Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.938742Z [info     ]   File "/project/.meltano/loaders/target-duckdb/venv/bin/target-duckdb", line 8, in <module> cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.938902Z [info     ]     sys.exit(main())           cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939013Z [info     ]   File "/project/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/__init__.py", line 427, in main cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939113Z [info     ]     persist_lines(connection, config, singer_messages) cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939249Z [info     ]   File "/project/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/__init__.py", line 314, in persist_lines cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939368Z [info     ]     flushed_state = flush_streams( cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939516Z [info     ]   File "/project/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/__init__.py", line 351, in flush_streams cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939660Z [info     ]     load_stream_batch(         cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939792Z [info     ]   File "/project/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/__init__.py", line 392, in load_stream_batch cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.939907Z [info     ]     flush_records(stream, records_to_load, row_count[stream], db_sync, temp_dir) cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.940020Z [info     ]   File "/project/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/__init__.py", line 409, in flush_records cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.940107Z [info     ]     db_sync.load_rows(records_to_load.values(), row_count, temp_dir) cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.940236Z [info     ]   File "/project/.meltano/loaders/target-duckdb/venv/lib/python3.9/site-packages/target_duckdb/db_sync.py", line 390, in load_rows cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.940352Z [info     ]     cur.execute(self.update_from_temp_table(temp_table)) cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
meltano-1  | 2024-05-01T04:04:31.940442Z [info     ] duckdb.duckdb.ParserException: Parser Error: syntax error at or near "FROM" cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-duckdb name=target-duckdb producer=False run_id=b3d7d89d-612d-48c5-9e09-e9c1b8ce180d stdio=stderr string_id=target-duckdb
shippy commented 6 months ago

The good folks from Meltano Slack have helped me narrow down the issue to this section of the code - if I'm reading this right, the code fails not because there are two primary keys, but because there are zero non-primary keys?

jwills commented 6 months ago

Ah that makes sense? What would be the right thing to do in that case?

shippy commented 6 months ago

Still write the new primary keys, I guess? I haven't actually eval'd the code to make sure that the resulting SQL is broken, though.