pingcap / tidb-lightning

This repository has been moved to https://github.com/pingcap/br
Apache License 2.0
142 stars 66 forks source link

restore: check row value count to avoid unexpected encode result #528

Closed glorv closed 3 years ago

glorv commented 3 years ago

What problem does this PR solve?

~Check row field count before encodes, if row value count is bigger than table field count, directly return an error.~

What is changed and how it works?

Check List

Tests

Side effects

Related changes

Release Note

glorv commented 3 years ago

/run-all-tests

glorv commented 3 years ago

/run-all-tests

lance6716 commented 3 years ago

LGTM

glorv commented 3 years ago

what bug does this PR fix?

We met a case last week that the source file columns are more than the table schema, and this caused tidb backend panic at https://github.com/pingcap/tidb-lightning/blob/c2b6dde7200228f743bceec1724c3a0ad646992b/lightning/backend/tidb.go#L251.

And for local/importer backend, whether it can result in correct result depends on the implementation of tidb table encoder, thus it is undefined behavior.

glorv commented 3 years ago

To close this issue: #518

Not really. #531 should fix one cause of the tikv timeout in checksum/analyze phase. There is no corresponding issue for this one. We found it in an oncall issue though the root cause of that issue is user misconfiged the table schema.

kennytm commented 3 years ago

We met a case last week that the source file columns are more than the table schema, and this caused tidb backend panic at https://github.com/pingcap/tidb-lightning/blob/c2b6dde7200228f743bceec1724c3a0ad646992b/lightning/backend/tidb.go#L251.

maybe easier to check len(enc.columnIdx) >= len(row) in this file and also enc.columnIdx[i] >= 0.

or maybe do it inside initializeColumns and throw the unknownCols error.

And for local/importer backend, whether it can result in correct result depends on the implementation of tidb table encoder

pretty sure they will just be ignored.


we also need to support "ignore_extra_columns" in Fast Bulk Import i.e. it is legal to have a CSV in the form

name,info,ignored,address
Foo,Foo2,X,1234567
Bar,Bar2,Y,987654321

with the column "ignored" skipped.

glorv commented 3 years ago

ok, we could delay resolve this issue to the implementation of "ignore_extra_columns" since this is an ergent thing.

glorv commented 3 years ago

With a CSV without header, or it's an invalid csv with the fields in some line more than in header, maybe we still should check it in the tidb encoder

kennytm commented 3 years ago

i mean we should still check it, but we should do it initializeColumns or the TiDB Encoder, not in the encodeLoop.

glorv commented 3 years ago

i mean we should still check it, but we should do it initializeColumns or the TiDB Encoder, not in the encodeLoop.

got. I'll move it into the tidb encoder since tableKvEncoder can properly handle it.

glorv commented 3 years ago

@kennytm I have moved the check into tidb encoder, and fix an extra bug related to _tidb_rowid, PTAL again

glorv commented 3 years ago

@lance6716 PTAL again, since it has changed a log.

kennytm commented 3 years ago

/run-all-tests

TiDB crashed?!

[2020-12-25T03:56:45.752Z] [mysql] 2020/12/25 11:56:45 packets.go:36: unexpected EOF

[2020-12-25T03:56:45.752Z] [mysql] 2020/12/25 11:56:45 connection.go:135: write tcp 127.0.0.1:27016->127.0.0.1:4000: write: broken pipe

[2020-12-25T03:56:48.274Z] [mysql] 2020/12/25 11:56:48 packets.go:122: closing bad idle connection: EOF

[2020-12-25T03:56:48.275Z] [mysql] 2020/12/25 11:56:48 connection.go:135: write tcp 127.0.0.1:27018->127.0.0.1:4000: write: broken pipe

[2020-12-25T03:56:48.275Z] [mysql] 2020/12/25 11:56:48 packets.go:122: closing bad idle connection: EOF

[2020-12-25T03:56:48.275Z] [mysql] 2020/12/25 11:56:48 connection.go:135: write tcp 127.0.0.1:27036->127.0.0.1:4000: write: broken pipe

[2020-12-25T03:56:48.275Z] Error: read checkpoint failed: begin transaction failed: dial tcp 127.0.0.1:4000: connect: connection refused
glorv commented 3 years ago

Still need to resolve the integration test tidb_rowid first.

lance6716 commented 3 years ago

(LGTM, bot has already count my lgtm)

glorv commented 3 years ago

@kennytm PTAL again

kennytm commented 3 years ago

/lgtm