gotec / git2net

An Open Source Python package for the extraction of fine-grained and time-stamped co-editing networks from git repositories.
https://git2net.readthedocs.io
GNU Affero General Public License v3.0
53 stars 16 forks source link

Max recursion depth exceeded #6

Closed larham closed 4 years ago

larham commented 4 years ago

Describe the bug

Searching for aliases
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/git2net/extraction.py", line 1256, in _get_path_to_leaf_node
    return _get_path_to_leaf_node(dag, list(dag.successors[node])[0], _path=[node] + _path)
  File "/usr/local/lib/python3.7/site-packages/git2net/extraction.py", line 1256, in _get_path_to_leaf_node
    return _get_path_to_leaf_node(dag, list(dag.successors[node])[0], _path=[node] + _path)
  File "/usr/local/lib/python3.7/site-packages/git2net/extraction.py", line 1256, in _get_path_to_leaf_node
    return _get_path_to_leaf_node(dag, list(dag.successors[node])[0], _path=[node] + _path)
  [Previous line repeated 996 more times]
  File "/usr/local/lib/python3.7/site-packages/git2net/extraction.py", line 1253, in _get_path_to_leaf_node
    successors = dag.successors[node].difference(set(_path))
RecursionError: maximum recursion depth exceeded while calling a Python object

To Reproduce Steps to reproduce the behavior:

  1. use a repo with 805 files and 1260 commits (total across all branches using git rev-list --all --count
  2. create database with max_modifications = 5
  3. use sample code to attempt visualizing a graph
  4. See error

Expected behavior No error.

Note: this error seems to be a recursion issue. The method _get_path_to_leaf_node() is recursive, so its maximum recursion (listed in the stack trace as about 1000, according to repeated 996 more times, seems to be a hard limit on the number of connections possible between leaf nodes. It seems like recursion will limit this tool to a small repo with few edits/authors until such time as the recursion is removed for that particular extraction of data.

Desktop (please complete the following information):

gotec commented 4 years ago

Hi Larry, thank you for reporting this. I assume you were using the function get_line_editing_paths to create the graph.

If so, I was able to replicate the issue. I will rewrite the function to detect file renamings to no longer be recursive. In the meantime I can suggest two things:

  1. Treat renamed files as different files by using get_line_editing_paths with merge_renaming=False.
  2. Rename the matching files yourself in the SQLite database and then use suggestion 1.

I'll get back to you as soon as I found a solution. Best, Christoph

gotec commented 4 years ago

I have just committed a fix so that file renamings are no longer detected recursively. I will release a new version of git2net together with some other fixes on PyPI tomorrow.

On a related note: line editing networks are the newest feature of git2net. As they represent the states of all lines of code throughout the entire history of a project they tend to be very large. This will likely lead to challenges for visualisation with pathpy as it is creating a dynamic and interactive representation. Therefore, it is probably a good idea to export the resulting networks to a different network library for visualisation. Functions to do this can be found here (e.g. network_to_networkx or via adjacency_matrix).

larham commented 4 years ago

You are kind to change code and help direct me to other visualizations, thank you!

gotec commented 4 years ago

Fixed in git2net 1.1.5 which I just released on PyPI.

Please note that with version 1.1.5 I have changed the standard behaviour of get_line_editing_paths to use merge_renaming=False by default. If you want to use the option you now need to manually set it to True. Further note that detecting renaming is a difficult task and the current implementation is not perfect. E.g. in the (from what I can tell) rare case that two different files have the same name and location at different points in time, my current algorithm would wrongly identify the two files as the same file.

Let me know if this helps and thanks again for reporting this!