Closed shiyuhang0 closed 1 year ago
[REVIEW NOTIFICATION]
This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer
in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer
in the comment to help you merge this pull request.
The full list of commands accepted by this bot can be found here.
@birdstorm PTAL It is a simple PR.
I think we can just reload db cache when the schema version has changed. GetTable or ListTable method will find the table cache is empty, then just load all tables in its' own db rather than all dbs. These will reduce the pressure on users with a large number of tables.
I find you use relaod(true) which will reload all table scehmas in the following PR. I am afraid I miss something which needs it. https://github.com/pingcap/tispark/pull/1623 https://github.com/pingcap/tispark/pull/2106
could you please take a look at this PR:
@birdstorm It is a simple PR.
I think we can just reload db cache when the schema version has changed. GetTable or ListTable method will find the table cache is empty, then just load all tables in its' own db rather than all dbs. These will reduce the pressure on users with a large number of tables.
I find you use relaod(true) which will reload all table scehmas in the following PR. I am afraid I miss something which needs it. https://github.com/pingcap/tispark/pull/1623 https://github.com/pingcap/tispark/pull/2106
could you please take a look at this PR:
- Is there anything I miss? Can I just reload without reloading the table cache?
- Do you think we need a configuration to determine whether load tables cache?
@shiyuhang0 have you tested the case where there is rapid change in the schema version? e.g., Many tables are created and deleted at the same time, while the queries randomly occur on those tables? We added the function to reload catalog on tables because when we tested previously, it is possible for some of the queries to return “table not exist“ error even if the tables exist.
In theory, we do not need to reload tables every time because we should be able to resolve the ”table not exist” error in our error handler. It is just a reminder that the mentioned case should be tested and I will approve this PR considering that.
@birdstorm: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.
@birdstorm It is a simple PR. I think we can just reload db cache when the schema version has changed. GetTable or ListTable method will find the table cache is empty, then just load all tables in its' own db rather than all dbs. These will reduce the pressure on users with a large number of tables. I find you use relaod(true) which will reload all table scehmas in the following PR. I am afraid I miss something which needs it.
1623
2106
could you please take a look at this PR:
- Is there anything I miss? Can I just reload without reloading the table cache?
- Do you think we need a configuration to determine whether load tables cache?
@shiyuhang0 have you tested the case where there is rapid change in the schema version? e.g., Many tables are created and deleted at the same time, while the queries randomly occur on those tables? We added the function to reload catalog on tables because when we tested previously, it is possible for some of the queries to return “table not exist“ error even if the tables exist.
So, these PRs is aiming to fix the table cache miss (mostly because of the race I think). But, I can not find where is the race in the code :( I think it is necessary to have a test. Is the “table not exist“ easy to reproduce? I need to know if the race will not happen anymore or it is just not be reproduced
@birdstorm It is a simple PR. I think we can just reload db cache when the schema version has changed. GetTable or ListTable method will find the table cache is empty, then just load all tables in its' own db rather than all dbs. These will reduce the pressure on users with a large number of tables. I find you use relaod(true) which will reload all table scehmas in the following PR. I am afraid I miss something which needs it.
1623
2106
could you please take a look at this PR:
- Is there anything I miss? Can I just reload without reloading the table cache?
- Do you think we need a configuration to determine whether load tables cache?
@shiyuhang0 have you tested the case where there is rapid change in the schema version? e.g., Many tables are created and deleted at the same time, while the queries randomly occur on those tables? We added the function to reload catalog on tables because when we tested previously, it is possible for some of the queries to return “table not exist“ error even if the tables exist.
So, these PRs is aiming to fix the table cache miss. But, I can not find where is the race in the code :( I think it is necessary to have a test. Is the “table not exist“ easy to reproduce? I need to know if the race will not happen anymore or it is just not be reproduced
It should be easy to reproduce if the race still exists. After we have fixed the region cache issue, it is possible that the problem has been resolved since then. In either case, I believe the case should be re-tested in this PR to ensure that.
/run-all-tests
/run-all-tests
/run-all-tests
/run-all-tests
/run-all-tests
/merge
This pull request has been accepted and is ready to merge.
In response to a cherrypick label: new pull request created to branch release-3.1
: #2676.
In response to a cherrypick label: new pull request created to branch release-3.2
: #2677.
In response to a cherrypick label: new pull request created to branch release-3.0
: #2678.
What problem does this PR solve?
TiSpark will reload all table schemas when init catalogCache. It will also reload all table schemas once we call listTable or getTable with the invalid cache.
One user reports that TiSpark hangs with more than 10000+ table schemas.
What is changed and how it works?
Add a configuration
spark.tispark.load_tables
, default is true. when set it to false, we will not load all tables when we reload the cache. table cache will only be loaded when the table is used.Check List
Tests