pingcap / tispark

TiSpark is built for running Apache Spark on top of TiDB/TiKV
Apache License 2.0
880 stars 243 forks source link

Retry for range exceed error (#2774) #2777

Closed ti-chi-bot closed 3 months ago

ti-chi-bot commented 3 months ago

This is an automated cherry-pick of #2774

What problem does this PR solve?

TiSpark may set the wrong range to TiKV when using FetchHandleRDD. We have two guesses about this:

  1. TiSpark has a bug when splitting range with index scan. This bug only occurs under certain data.
  2. TiSpark supports cluster index but client-java does not support it. There is a problem with the coordination between them.

What is changed and how it works?

Since it is hard to find the root cause, we just log it and retry once when this error occurs. We use client-java's splitRangeByRegion method to avoid exceeding the bound issue. It seems this method can split the range correctly.

Spark Plan

= Physical Plan == *(1) ColumnarToRow +- TiSpark RegionTaskExec{downgradeThreshold=1000000000,downgradeFilter=[] +- RowToColumnar +- TiKV FetchHandleRDD{[table: items] IndexLookUp, Columns: item_primary_key@BYTES, item_id@VARCHAR(45), item_set_id@VARCHAR(45), product_id@VARCHAR(45), product_set_id@VARCHAR(45), point_of_sale_country@VARCHAR(2), merchant_id@LONG, merchant_item_id@VARCHAR(127), merchant_item_set_id@VARCHAR(127), domains@JSON, product_sources@JSON, image_signatures@JSON, normalized_short_link_clusters@JSON, canonical_links@JSON, feed_item_ids@JSON, feed_profile_ids@JSON, reconciled_data@JSON, source_data@JSON, cdc_change_indicator@JSON, cdc_new_values@JSON, cdc_old_values@JSON, created_time@LONG, arrival_time@LONG, updated_time@LONG, timestamp_data@JSON: { {IndexRangeScan(Index:item_id(item_id)): { RangeFilter: [], Range: [([t\200\000\000\000\000\000\023\226_i\200\000\000\000\000\000\000\003\000], [t\200\000\000\000\000\000\023\226_i\200\000\000\000\000\000\000\003\372])] }}; {TableRowIDScan} }, startTs: 448636486137151521}
ti-chi-bot[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign xuanyu66 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/pingcap/tispark/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment