redrabbit / git.limo

A Git source code management tool powered by Elixir with easy installation & high extensibility.
https://git.limo
MIT License
497 stars 42 forks source link

Git history on trees and blobs is slow. #36

Closed redrabbit closed 5 years ago

redrabbit commented 6 years ago

When showing the history of a given tree or blob (see #31), things can get pretty slow. Commit 81a75c0 helped marginally but the computation of such an action remains very intensive.

Testing with the current HEAD of elixir-lang/elixir (~15.000 commits):

Fetching the history for a given revision is done relatively fast. In contrast, having to filter for a given pathspec takes forever.

Both methods could be greatly improved if pagination does not require to compute the entire revwalk (we currently need to Enum.count/1 the entire revwalk to show the last available page).

Instead we should provide a pagination similar to one specified by Relay Cursor Connections. We loose the possibility to show the number of pages, in return we can start the revwalk from a given commit and Stream.take/2 a small number of commits.

redrabbit commented 6 years ago

Commit a20d335 adds a significant boost. With a big repository we can go from >6s down to 150ms. But this also depends a lot on which blob/tree is used.

Explanation

The history of a file being changed frequently will render much faster than a file changed more rarely.

For a Git commit history of a tree or blob the cost intensive part is that we have to filter each commit to ensure that the given tree/blob has changed. This means

1) Loading the tree of the commit and its parent. 2) Diffing both trees. 3) Matching the given pathspec.

Having to process the entire 15.000 commits takes relatively long. The new paginate_cursor/5 function helps by skipping as much commits as possible...

Implementation

Here's a basic usage example for paginating commit history:

page = paginate_cursor(@conn, @commits, &(oid_fmt(&1) == &2), &oid_fmt(&1.oid))

The performance improvement is due to the fact that

1) We don't go through the entire list of items. Instead we skip (Enum.drop_while/2) items until we match (third param) the cursor. Having the stream's :enum trimmed until cursor. 2) We Stream.take/2 until we aggregate limit+1 items (or reach the end of the stream). 3) We create a new cursor (fourth param) pointing at the previous/next page item.

They are also a few caveats: