Closed redrabbit closed 5 years ago
Commit a20d335 adds a significant boost. With a big repository we can go from >6s down to 150ms. But this also depends a lot on which blob/tree is used.
The history of a file being changed frequently will render much faster than a file changed more rarely.
For a Git commit history of a tree or blob the cost intensive part is that we have to filter each commit to ensure that the given tree/blob has changed. This means
1) Loading the tree of the commit and its parent. 2) Diffing both trees. 3) Matching the given pathspec.
Having to process the entire 15.000 commits takes relatively long. The new paginate_cursor/5
function helps by skipping as much commits as possible...
Here's a basic usage example for paginating commit history:
page = paginate_cursor(@conn, @commits, &(oid_fmt(&1) == &2), &oid_fmt(&1.oid))
The performance improvement is due to the fact that
1) We don't go through the entire list of items. Instead we skip (Enum.drop_while/2
) items until we match (third param) the cursor. Having the stream's :enum
trimmed until cursor.
2) We Stream.take/2
until we aggregate limit
+1 items (or reach the end of the stream).
3) We create a new cursor (fourth param) pointing at the previous/next page item.
They are also a few caveats:
:enum
field of Stream{}
to drop items until fn_filter
returns true
.
GitRekt.Git.enumerate/1
to resolve the first layer (NIF) of the stream. We use this to make sure that libgit2 iterators behave more Erlangish.limit
items quiet fast. But if we have a file that is rarely modified, we might have to process a very big chunk of the full commit list in order to create that page of limit
items.
When showing the history of a given tree or blob (see #31), things can get pretty slow. Commit 81a75c0 helped marginally but the computation of such an action remains very intensive.
Testing with the current
HEAD
of elixir-lang/elixir (~15.000 commits):master
take ~140ms.master
with a pathspec takes ~6.000ms.Fetching the history for a given revision is done relatively fast. In contrast, having to filter for a given pathspec takes forever.
Both methods could be greatly improved if pagination does not require to compute the entire revwalk (we currently need to
Enum.count/1
the entire revwalk to show the last available page).Instead we should provide a pagination similar to one specified by Relay Cursor Connections. We loose the possibility to show the number of pages, in return we can start the revwalk from a given commit and
Stream.take/2
a small number of commits.