src-d / gitbase

SQL interface to git repositories, written in Go. https://docs.sourced.tech/gitbase
Apache License 2.0
2.06k stars 124 forks source link

Schema introspection (SHOW FULL COLUMNS ...) became very slow #961

Open gregsh opened 4 years ago

gregsh commented 4 years ago

In gitbase 0.20 schema introspection is fast and full.

MySQL Connector/J JDBC metadata call that gets all columns for all tables at once metaData.getColumns("gitbase", "", "%", "%") is converted to calls like the following for each table SHOW FULL COLUMNS FROM `commit_trees` FROM `gitbase` LIKE '%'"

In 0.23 and 0.24-rc the above queries are very slow (several minutes) and even fail for some tables completely in 0.23 (0.24 seems to fix that).

The above prevents from using gitbase in DB tools like JetBrains DataGrip.

agarciamontoro commented 4 years ago

Thank you for the report, @gregsh! We'll take a look as soon as we can.

erizocosmico commented 4 years ago

Can you provide a step-by-step example we can reproduce? What operating system are you using?

I thought I reproduced it so I started investigating, but everything seemed to work in tests and when I manually wrote the query. Then I realised the trailing " in your query, which was what made it wait forever expecting the end of the query. So, after all, I cannot reproduce this problem.

gregsh commented 4 years ago

I'm introspecting gitbase schema using soon to be released version of DataGrip and Mysql Connector/J 8.0.15.

Here's the part of log, the ending quote does not belong to the query.

DEBU[31609] executing query query="/* ApplicationName=DataGrip 2019.3 Snapshot */ SHOW FULL COLUMNS FROM `blobs` FROM `gitbase` LIKE '%'" INFO[31609] audit trail action=authorization address="127.0.0.1:58216" connection_id=15 permission=read pid=79 query="/* ApplicationName=DataGrip 2019.3 Snapshot */ SHOW FULL COLUMNS FROM `blobs` FROM `gitbase` LIKE '%'" success=true system=audit user=guest INFO[31609] audit trail action=query address="127.0.0.1:58216" connection_id=15 duration="972.847µs" pid=79 query="/* ApplicationName=DataGrip 2019.3 Snapshot */ SHOW FULL COLUMNS FROM `blobs` FROM `gitbase` LIKE '%'" success=true system=audit user=guest

erizocosmico commented 4 years ago

Still cannot reproduce this. I just connected using MySQL connector/J 8.0.15 with the following code:

    val connection = DriverManager.getConnection(
      "jdbc:mysql://%s?serverTimezone=UTC".format("localhost:3306"),
      "root",
      ""
    )

    val meta = connection.getMetaData

    val rs = meta.getColumns("gitbase", "", "%", "%")
    while (rs.next()) {
      println(rs.getString(3), rs.getString(4), rs.getString(5))
    }

And it prints

(blobs,repository_id,-1)
(blobs,blob_hash,12)
(blobs,blob_size,4)
(blobs,blob_content,-4)
(commit_blobs,repository_id,-1)
(commit_blobs,commit_hash,12)
(commit_blobs,blob_hash,12)
(commit_files,repository_id,-1)
(commit_files,commit_hash,12)
(commit_files,file_path,-1)
(commit_files,blob_hash,12)
(commit_files,tree_hash,12)
(commit_trees,repository_id,-1)
(commit_trees,commit_hash,12)
(commit_trees,tree_hash,12)
(commits,repository_id,-1)
(commits,commit_hash,12)
(commits,commit_author_name,-1)
(commits,commit_author_email,12)
(commits,commit_author_when,93)
(commits,committer_name,-1)
(commits,committer_email,12)
(commits,committer_when,93)
(commits,commit_message,-1)
(commits,tree_hash,12)
(commits,commit_parents,1111)
(files,repository_id,-1)
(files,file_path,-1)
(files,blob_hash,12)
(files,tree_hash,12)
(files,tree_entry_mode,12)
(files,blob_content,-4)
(files,blob_size,4)
(ref_commits,repository_id,-1)
(ref_commits,commit_hash,12)
(ref_commits,ref_name,-1)
(ref_commits,history_index,4)
(refs,repository_id,-1)
(refs,ref_name,-1)
(refs,commit_hash,12)
(remotes,repository_id,-1)
(remotes,remote_name,-1)
(remotes,remote_push_url,-1)
(remotes,remote_fetch_url,-1)
(remotes,remote_push_refspec,-1)
(remotes,remote_fetch_refspec,-1)
(repositories,repository_id,-1)
(tree_entries,repository_id,-1)
(tree_entries,tree_entry_name,-1)
(tree_entries,blob_hash,12)
(tree_entries,tree_hash,12)
(tree_entries,tree_entry_mode,12)

In 413ms.

Tried both with master version and v0.24.0-rc1 using a docker container.

gregsh commented 4 years ago

I have found that it depends on the size of a repository. On a small one (70MB .git) it is fast, but on a large one (9GB .git) it is extremely slow.

erizocosmico commented 4 years ago

I just tried with the linux kernel and pytorch and it still takes milliseconds. Introspection never touches the repositories, so it should not depend on them.

Do you have indexes created in gitbase? Perhaps it's the indexes checksum when starting the server.