pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.26k stars 5.84k forks source link

Lighting precheck shows incorrect estimate sorted data size #54216

Open db-will opened 4 months ago

db-will commented 4 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Use lightning to import large enough data (>5TB or more) with phsyical import and precheck enabled.

The issue seems related to converting int64 to float64 during print: https://github.com/pingcap/tidb/blob/master/lightning/pkg/importer/precheck_impl.go#L587-L588

theResult.Message = fmt.Sprintf("local disk resources are rich, estimate sorted data size %s, local available is %s",
    units.BytesSize(float64(estimatedDataSizeWithIndex)), units.BytesSize(float64(localAvailable)))

https://github.com/pingcap/tidb/blob/master/lightning/pkg/importer/precheck_impl.go#L179-L180

theResult.Message += fmt.Sprintf("TiKV requires more storage space. Estimated required size: %s. Actual size: %s.",
    units.BytesSize(float64(tikvSourceSize)), units.BytesSize(float64(tikvAvail)))

Consider the imported data size is small, it could also relates to tikvSourceSize calculation incorrect and hence trigger such issue.

2. What did you expect to see? (Required)

The precheck log should show correct number for estimated data source size.

3. What did you see instead (Required)

local disk resources are rich, estimate sorted data size -3.074e+18B, local available is xxx(~700GB)
TiKV requires more storage space. Estimated required size: 8EiB. Actual size: xxx(~5TB).  

4. What is your TiDB version? (Required)

v7.5.1

lance6716 commented 4 months ago

Hi to be clarify, do you mean there are two problems?

  1. "estimate sorted data size -3.074e+18B" the data size format is floating number and negative.
  2. "Estimated required size: 8EiB" this value is too large
db-will commented 4 months ago

Yes. the original data import is less than 10 GB with simple indexing pattern.

fubinzh commented 4 months ago

/severity moderate