vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability
https://nebula-graph.io
Apache License 2.0
10.65k stars 1.19k forks source link

Suggestions for optimizing the statistics feature #2615

Open randomJoe211 opened 3 years ago

randomJoe211 commented 3 years ago

When the data size is huge, starting a job such as SUBMIT JOB STATS may consume quite a lot of system resources. And we don't always need to run SUBMIT JOB STATS when we need statistics.

Running SHOW STATS returns the statistics made from the latest stats job, users have two choices:

  1. Always run SUBMIT JOB STATS before SHOW STATS to make sure they have the most recent statistics.
  2. Run SHOW JOBS to see if the last SUBMIT JOB STATS is finished at a time that they could accept. For example, SHOW JOBS shows that the most recent SUBMIT JOB STATS job was finished this morning, and they have not imported or inserted any new data since yesterday, then they can just run SHOW STATS to get the statistics, without having to run SUBMIT JOB STATS.

OK, can we save the user some time, like, add a line in the SHOW STATS results to show when these statistics were made? image

critical27 commented 3 years ago

Time of stats is reasonable, but the behavior won't be changed.