zendesk / maxwell

Maxwell's daemon, a mysql-to-json kafka producer
https://maxwells-daemon.io/
Other
4.04k stars 1.01k forks source link

Exception: Couldn't find table #223

Closed jpeyper closed 8 years ago

jpeyper commented 8 years ago

One of our more active testing environments is getting this error.

com.zendesk.maxwell.schema.ddl.SchemaSyncError: Couldn't find table: cgt.20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000
    at com.zendesk.maxwell.schema.ddl.TableAlter.apply(TableAlter.java:44)
    at com.zendesk.maxwell.MaxwellReplicator.processQueryEvent(MaxwellReplicator.java:296)
    at com.zendesk.maxwell.MaxwellReplicator.getRow(MaxwellReplicator.java:268)
    at com.zendesk.maxwell.MaxwellReplicator.work(MaxwellReplicator.java:89)
    at com.zendesk.maxwell.RunLoopProcess.runLoop(RunLoopProcess.java:31)
    at com.zendesk.maxwell.Maxwell.run(Maxwell.java:91)
    at com.zendesk.maxwell.Maxwell.main(Maxwell.java:97)
2015-12-23T13:50:13,450 INFO  com.zendesk.maxwell.schema.SchemaPosition: Storing final position: null

My theory was that this table is very short lived (should probably be a temporary table but that cant be helped).

I tried reproducing this by running these two commands quickly

CREATE TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` (`id` INT (11));

DROP TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000`;

but this didn't cause any issues so I guess I'm wrong.

osheroff commented 8 years ago

two commands quickly wouldn't do much; maxwell processes commands sequentially and shouldn't miss any DDL. There's a possibility of a race condition when maxwell first captures schema; it doesn't take any locks, so if you create tables around the same time as the first boot of maxwell, that could maybe cause problems. Or there could be something else. Is there anything in the maxwell.tables table that looks like it might have screwed up the processing of the table creation?

jpeyper commented 8 years ago

the maxwell.positions tables looks like this

"server_id" "binlog_file"   "binlog_position"
"11543" "sthx11543-bin.000007"  "669960990"

the query SHOW BINLOG EVENTS IN "sthx11543-bin.000007" FROM 669957378 LIMIT 10; (a few positions before 669960990) returns this

"Log_name"  "Pos"   "Event_type"    "Server_id" "End_log_pos"   "Info"
"sthx11543-bin.000007"  "669957378" "Query" "11543" "669957450" "BEGIN"
"sthx11543-bin.000007"  "669957450" "Query" "11543" "669960990" "use `tspa`; CREATE TABLE `cgt`.`20151130_tsp_10291__cgpos_0_df100961b3b9476fb02d495d50c37000` (
  `id` int(11) NOT NULL DEFAULT '0',
  `trnnum` int(11) DEFAULT NULL,
  `seq` int(1) NOT NULL DEFAULT '0',
  `trnnumrev` bigint(11) NOT NULL DEFAULT '0',
  `memberno` varchar(8) DEFAULT NULL,
  `imnum` int(11) DEFAULT NULL,
  `memclass` char(1) DEFAULT NULL,
  `type` char(1) DEFAULT NULL,
  `imdispcode` varchar(13) DEFAULT '0',
  `imname` varchar(40) DEFAULT NULL,
  `divname` varchar(40) DEFAULT NULL,
  `dispname` varchar(60) DEFAULT NULL,
  `batch` int(11) DEFAULT NULL,
  `effdate` date NOT NULL DEFAULT '0000-00-00',
  `duedate` date DEFAULT NULL,
  `trntype` char(1) DEFAULT NULL,
  `trnsubtype` char(1) DEFAULT NULL,
  `short` varchar(14) DEFAULT NULL,
  `cgtax` decimal(13,2) DEFAULT NULL,
  `gross` decimal(13,2) DEFAULT NULL,
  `costs` decimal(13,2) DEFAULT NULL,
  `tax` decimal(13,2) DEFAULT NULL,
  `net` decimal(13,2) DEFAULT NULL,
  `assetvalue` decimal(17,9) DEFAULT NULL,
  `holding` decimal(17,6) DEFAULT NULL,
  `units` decimal(17,6) DEFAULT NULL,
  `units_before` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `units_after` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `units_review` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `review_status` int(2) NOT NULL DEFAULT '0',
  `ir_taxfree` decimal(13,2) NOT NULL DEFAULT '0.00',
  `ir_taxdef` decimal(13,2) NOT NULL DEFAULT '0.00',
  `ir_capreturn` decimal(13,2) NOT NULL DEFAULT '0.00',
  `cg_o_taxfree` decimal(13,2) NOT NULL DEFAULT '0.00',
  `cg_o_taxdef` decimal(13,2) NOT NULL DEFAULT '0.00',
  `cg_o_capreturn` decimal(13,2) NOT NULL DEFAULT '0.00',
  `cg_taxfree` decimal(13,2) NOT NULL DEFAULT '0.00',
  `cg_taxdef` decimal(13,2) NOT NULL DEFAULT '0.00',
  `cg_capreturn` decimal(13,2) NOT NULL DEFAULT '0.00',
  `adj_taxfree` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `adj_taxdef` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `adj_capreturn` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `iadj_taxfree` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `iadj_taxdef` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `iadj_capreturn` decimal(16,6) NOT NULL DEFAULT '0.000000',
  `adj_gross` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cb_short` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cb_long` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cb_index` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cb_loss` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cba_short` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cba_long` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cba_index` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cba_loss` decimal(12,2) NOT NULL DEFAULT '0.00',
  `cg_costbase` decimal(14,2) NOT NULL DEFAULT '0.00',
  `cg_indexcost` decimal(13,2) NOT NULL DEFAULT '0.00',
  `cg_costdate` date DEFAULT NULL,
  `cb_indexfactor` decimal(8,5) NOT NULL DEFAULT '0.00000',
  `cg_minKdate` date DEFAULT NULL,
  `cg_soldtrn` bigint(11) NOT NULL DEFAULT '0',
  `cg_solddate` date DEFAULT NULL,
  `cg_assetvalue` decimal(17,9) DEFAULT NULL,
  `cg_proceeds` decimal(13,2) DEFAULT '0.00',
  `cg_method` char(1) DEFAULT '',
  `cg_short` decimal(13,2) DEFAULT NULL,
  `cg_long` decimal(13,2) DEFAULT '0.00',
  `cg_index` decimal(13,2) DEFAULT '0.00',
  `cg_loss` decimal(13,2) DEFAULT '0.00',
  `cgtevent` char(1) DEFAULT 'Y',
  `cg_id` int(11) DEFAULT '0',
  `zzzdate` varchar(8) DEFAULT NULL,
  `zzztime` varchar(11) DEFAULT NULL,
  `zzzusrcode` varchar(8) DEFAULT NULL,
  `orderdate` date DEFAULT NULL,
  `Status` int(1) NOT NULL DEFAULT '0'
)"
"sthx11543-bin.000007"  "669960990" "Query" "11543" "669961063" "COMMIT"
"sthx11543-bin.000007"  "669961063" "Query" "11543" "669961135" "BEGIN"
"sthx11543-bin.000007"  "669961135" "Table_map" "11543" "669961438" "table_id: 6701749 (cgt.20151130_tsp_10291__cgpos_0_df100961b3b9476fb02d495d50c37000)"
"sthx11543-bin.000007"  "669961438" "Write_rows"    "11543" "669962001" "table_id: 6701749 flags: STMT_END_F"
"sthx11543-bin.000007"  "669962001" "Xid"   "11543" "669962032" "COMMIT /* xid=262792029 */"
"sthx11543-bin.000007"  "669962032" "Query" "11543" "669962381" "use `tspa`; ALTER TABLE cgt.20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000
ADD KEY `TRNNUM` (`trnnum`, `seq`),
ADD KEY `SOLDTRN` (`cg_soldtrn`),
ADD KEY `COSTDATE` (`cg_costdate`),
ADD KEY `EFFDATE` (`effdate`),
ADD KEY `TRNTYPE` (`trntype`),
ADD KEY `IMNUM` (`imnum`)"
"sthx11543-bin.000007"  "669962381" "Query" "11543" "669962462" "BEGIN"
"sthx11543-bin.000007"  "669962462" "Table_map" "11543" "669962765" "table_id: 6701750 (cgt.20151130_tsp_10291__cgpos_0_df100961b3b9476fb02d495d50c37000)"
osheroff commented 8 years ago

@jpeyper I think you're going for the gold medal of weird bugs.

osheroff commented 8 years ago

I see the problem, though, I think; somehow you've got "use DB; DDL" in your logs. I'm not... quite sure how to reproduce that. Any chance you could look for a code snippet that might be doing that?

osheroff commented 8 years ago

no, nevermind. I'm just being fooled by the output of show binlog events

jpeyper commented 8 years ago

Actually, I was wrong, there were errors on my test in the first comment

2015-12-23T11:47:56,990 INFO  com.zendesk.maxwell.MaxwellReplicator: storing schema @BinlogPosition[awmsphx90022-bin.000010:36418897] after applying "CREATE TABLE `asp`.`test` (   `id` int(11) NOT NULL AUTO_INCREMENT,   PRIMARY KEY (`id`) )"
2015-12-23T11:56:04,047 INFO  com.zendesk.maxwell.MaxwellReplicator: storing schema @BinlogPosition[awmsphx90022-bin.000010:36419012] after applying "DROP TABLE `test` /* generated by server */"
2015-12-23T13:36:51,398 INFO  com.zendesk.maxwell.MaxwellReplicator: storing schema @BinlogPosition[awmsphx90022-bin.000010:104886013] after applying "CREATE TABLE `cgt`.`test` (   `id` int(11) NOT NULL AUTO_INCREMENT,   PRIMARY KEY (`id`) )"
2015-12-23T13:44:37,311 INFO  com.zendesk.maxwell.MaxwellReplicator: storing schema @BinlogPosition[awmsphx90022-bin.000010:104886134] after applying "DROP TABLE `cgt`.`test` /* generated by server */"
2015-12-23T13:52:33,055 INFO  com.zendesk.maxwell.MaxwellReplicator: storing schema @BinlogPosition[awmsphx90022-bin.000010:104886352] after applying "CREATE TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` (   `id` int(11) NOT NULL AUTO_INCREMENT,   PRIMARY KEY (`id`) )"
    at com.zendesk.maxwell.MaxwellReplicator.processQueryEvent(MaxwellReplicator.java:296)
com.zendesk.maxwell.schema.ddl.SchemaSyncError: Can't drop non-existant table: cgt.20151130_tsp_10291__cgpos_0_df100961b3b9476fb02d495d50c37000
    at com.zendesk.maxwell.MaxwellReplicator.getRow(MaxwellReplicator.java:268)
    at com.zendesk.maxwell.schema.ddl.TableDrop.apply(TableDrop.java:33)
    at com.zendesk.maxwell.RunLoopProcess.runLoop(RunLoopProcess.java:31)
    at com.zendesk.maxwell.Maxwell.main(Maxwell.java:97)
    at com.zendesk.maxwell.MaxwellReplicator.work(MaxwellReplicator.java:89)
    at com.zendesk.maxwell.Maxwell.run(Maxwell.java:91)
2015-12-23T14:00:36,380 INFO  com.zendesk.maxwell.schema.SchemaPosition: Storing final position: BinlogPosition[awmsphx90022-bin.000010:104886352]
jpeyper commented 8 years ago

@osheroff I love winning prizes, gold medals are best.

osheroff commented 8 years ago

interesting. case sensitivity issues. 20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000 vs 20151130_tsp_10291__cgpos_0_df100961b3b9476fb02d495d50c37000

osheroff commented 8 years ago

are you running in a case-insensitive environment? Mysql has some utterly fakakta'ed rules around this, see http://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html if you want to curse softly under breath.

osheroff commented 8 years ago

ie what's the value of lower_case_table_names? I need to train maxwell to deal with this shit, somehow. Maybe by giving it a configuration parameter so that it matches mysql's behavior. or something.

jpeyper commented 8 years ago

The commands I ran as a test (the create and drop) have the same casing.

I can run queries with mixed case for database and tables names without any issues.

"Variable_name" "Value"
"lower_case_table_names"    "1"

on a side note, is there any reason maxwell needs to care about schemas you arent including?

my config file looks like this

...
include_dbs=asp,bbs,bip,bss,global,globip,jrb,map,tspa
include_tables=contact,investor
...

but a lot of schema changes happen in other tables that cause Maxwell to stop for many minutes while it reprocesses the schemas and tables I don't care about. (we have a huge database with lots of schemas and tables).

jpeyper commented 8 years ago

Perhaps it might help if we can simplify this a bit to just my example.

These queries (with this case)

CREATE TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` (`id` INT (11)) ;
DROP TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` ;

generated these log events

"Log_name"  "Pos"   "Event_type"    "Server_id" "End_log_pos"   "Info"
"awmsphx90022-bin.000010"   "104886529" "Query" "90012" "104886701" "use `asp`; CREATE TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` (
  `id` int(11)
)"
"awmsphx90022-bin.000010"   "104886701" "Query" "90012" "104886878" "use `asp`; DROP TABLE `cgt`.`20151130_tsp_10291__cgpos_0_df100961b3b9476fb02d495d50c37000` /* generated by server */"

and Maxwell did this (ignore the order of the stack trace, Kibana is getting things wrong)

2015-12-23T13:52:33,055 INFO  com.zendesk.maxwell.MaxwellReplicator: storing schema @BinlogPosition[awmsphx90022-bin.000010:104886352] after applying "CREATE TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` (   `id` int(11) NOT NULL AUTO_INCREMENT,   PRIMARY KEY (`id`) )"
    at com.zendesk.maxwell.MaxwellReplicator.processQueryEvent(MaxwellReplicator.java:296)
com.zendesk.maxwell.schema.ddl.SchemaSyncError: Can't drop non-existant table: cgt.20151130_tsp_10291__cgpos_0_df100961b3b9476fb02d495d50c37000
    at com.zendesk.maxwell.MaxwellReplicator.getRow(MaxwellReplicator.java:268)
    at com.zendesk.maxwell.schema.ddl.TableDrop.apply(TableDrop.java:33)
    at com.zendesk.maxwell.RunLoopProcess.runLoop(RunLoopProcess.java:31)
    at com.zendesk.maxwell.Maxwell.main(Maxwell.java:97)
    at com.zendesk.maxwell.MaxwellReplicator.work(MaxwellReplicator.java:89)
    at com.zendesk.maxwell.Maxwell.run(Maxwell.java:91)
2015-12-23T14:00:36,380 INFO  com.zendesk.maxwell.schema.SchemaPosition: Storing final position: BinlogPosition[awmsphx90022-bin.000010:104886352]

I'm just waiting for Maxwell to restart so I can redo the test (takes about 45 mins to build the Maxwell schema on a clean start).

jpeyper commented 8 years ago

There are 46K rows in maxwell.tables and none of them are name = "20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000" before the create table runs, after the exception there is this row

"id"    "schema_id" "database_id"   "name"  "encoding"  "pk"
"59944" "3" "575"   "20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000"  "latin1"    ""
jpeyper commented 8 years ago

I can confirm this is easily reproducable on my setup here using these queries

CREATE TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` (`id` INT (11)) ;
DROP TABLE `cgt`.`20151130_TSP_10291__CGPos_0_DF100961B3B9476FB02D495D50C37000` ;
jpeyper commented 8 years ago

If I run these queries with a long gap in the middle (long enough for the table to show up in maxwell.positions, I can confirm that it still fails.

osheroff commented 8 years ago

yah, so lower_case_table_names = 1 means that mysql is not case sensitive, and it appears that mysql will dowcase the sql you send in (if you look at the sql maxwell is crashing on, the table name is lower case, which is what happens). So maxwell either needs to react to the server being case insensitive, or be configured the same way.

re:

on a side note, is there any reason maxwell needs to care about schemas you arent including?

yeah, it's good to always keep a copy of the schema in tact -- if one suddenly start including tables that were previously excluded, it's no good to start crashing because you didn't bother to scribble down the schema previously.

are you running on windows, btw?

jpeyper commented 8 years ago

After a bit more testing I came to the same conclusion but had to go home so I didn't get a chance to reply.

The database and Maxwell are running on red hat.

I'm assuming the fix (either way) isn't too hard to implement?

While I agree it would be better to keep your schema as up to date as possible, in our database, any schema changes are followed by 3-4 minutes of nothing, before the moving forward in the log again. We have a _work database that only exists to have tables created, data inserted and joined to, just to have it deleted, in a perfect world, we would make all these temporary tables, but they aren't and they will probably never be.

I don't want to derail this issue any further, so I might open an new topic to discuss this, but as long as maxwell works, the performance is secondary.

On Wed, 23 Dec 2015 6:13 pm Ben Osheroff notifications@github.com wrote:

yah, so lower_case_table_names = 1 means that mysql is not case sensitive, and it appears that mysql will dowcase the sql you send in (if you look at the sql maxwell is crashing on, the table name is lower case, which is what happens). So maxwell either needs to react to the server being case insensitive, or be configured the same way.

re:

on a side note, is there any reason maxwell needs to care about schemas you arent including?

yeah, it's good to always keep a copy of the schema in tact -- if one suddenly start including tables that were previously excluded, it's no good to start crashing because you didn't bother to scribble down the schema previously.

are you running on windows, btw?

— Reply to this email directly or view it on GitHub https://github.com/zendesk/maxwell/issues/223#issuecomment-166818732.

osheroff commented 8 years ago

I'm assuming the fix (either way) isn't too hard to implement?

yes/no -- there's 3 modes mysql can be in when it comes to case sensitivity, and to get it correct one has to (a) probe the mode that the server is running in and (b) ape it correctly. Not technically sophisticated but you have to get it right (and deal with old installations before it was right, etc)

osheroff commented 8 years ago

any schema changes are followed by 3-4 minutes of nothing, before the moving forward in the log again.

That seems like more of the real problem; if Maxwell could be a bit more efficient at schema storage I don't suppose you'd notice. I assume your overall column count is in the hundreds of thousands?

jpeyper commented 8 years ago

for my understanding, the three modes are

correct?

osheroff commented 8 years ago

about right, more precisely lower_case_table_names = 1 means "downcase everything before you store it", and lower_case_table_names = 2 means "downcase everything before you compare it" -- so 2 works sorta on OSX, where it preserves the case that you gave it but the filesystem is case insensitive.

http://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html

osheroff commented 8 years ago

Fixed in v0.17.0