pingcap / tispark

TiSpark is built for running Apache Spark on top of TiDB/TiKV
Apache License 2.0
883 stars 244 forks source link

Retry for range exceed error #2774

Closed shiyuhang0 closed 6 months ago

shiyuhang0 commented 6 months ago

What problem does this PR solve?

TiSpark may set the wrong range to TiKV when using FetchHandleRDD. We have two guesses about this:

  1. TiSpark has a bug when splitting range with index scan. This bug only occurs under certain data.
  2. TiSpark supports cluster index but client-java does not support it. There is a problem with the coordination between them.

What is changed and how it works?

Since it is hard to find the root cause, we just log it and retry once when this error occurs. We use client-java's splitRangeByRegion method to avoid exceeding the bound issue. It seems this method can split the range correctly.

Spark Plan

= Physical Plan == *(1) ColumnarToRow +- TiSpark RegionTaskExec{downgradeThreshold=1000000000,downgradeFilter=[] +- RowToColumnar +- TiKV FetchHandleRDD{[table: items] IndexLookUp, Columns: item_primary_key@BYTES, item_id@VARCHAR(45), item_set_id@VARCHAR(45), product_id@VARCHAR(45), product_set_id@VARCHAR(45), point_of_sale_country@VARCHAR(2), merchant_id@LONG, merchant_item_id@VARCHAR(127), merchant_item_set_id@VARCHAR(127), domains@JSON, product_sources@JSON, image_signatures@JSON, normalized_short_link_clusters@JSON, canonical_links@JSON, feed_item_ids@JSON, feed_profile_ids@JSON, reconciled_data@JSON, source_data@JSON, cdc_change_indicator@JSON, cdc_new_values@JSON, cdc_old_values@JSON, created_time@LONG, arrival_time@LONG, updated_time@LONG, timestamp_data@JSON: { {IndexRangeScan(Index:item_id(item_id)): { RangeFilter: [], Range: [([t\200\000\000\000\000\000\023\226_i\200\000\000\000\000\000\000\003\000], [t\200\000\000\000\000\000\023\226_i\200\000\000\000\000\000\000\003\372])] }}; {TableRowIDScan} }, startTs: 448636486137151521}
ti-chi-bot[bot] commented 6 months ago

@v01dstar: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to [this](https://github.com/pingcap/tispark/pull/2774#pullrequestreview-1965001663): >lgtm Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ti-chi-bot[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: v01dstar Once this PR has been reviewed and has the lgtm label, please ask for approval from shiyuhang0, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/pingcap/tispark/blob/release-3.2/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
shiyuhang0 commented 6 months ago

/run-all-tests tidb=release-6.1 tikv=release-6.1 pd=release-6.1

shiyuhang0 commented 6 months ago

/run-all-tests tidb=release-6.1 tikv=release-6.1 pd=release-6.1

shiyuhang0 commented 6 months ago

/cherry-pick master

ti-chi-bot commented 6 months ago

@shiyuhang0: new pull request created to branch master: #2777.

In response to [this](https://github.com/pingcap/tispark/pull/2774#issuecomment-2024355093): >/cherry-pick master Instructions for interacting with me using PR comments are available [here](https://prow.tidb.net/command-help). If you have questions or suggestions related to my behavior, please file an issue against the [ti-community-infra/tichi](https://github.com/ti-community-infra/tichi/issues/new?title=Prow%20issue:) repository.