Open Creadeyh opened 1 year ago
Same error when calling pydriller.Commit.modified_files
Hi! The commit you are referring to is in a submodule. To analyze those you need to clone submodules as well, otherwise Git complains that the commit doesn't exists.
As a test, try to run:
git show 72a32a67dee3a67dff76f565551907a2fc7e88e6
in your terminal. You'll see Git returns an error. After you init the submodules that should go away.
I understand that. The issue is that they removed the submodules, so the .gitmodules is empty and init does nothing.
I tried to work around it by retrieving the history of .gitmodules with Git.get_commits_modified_file()
, then checkout where .gitmodules was filled, and init-update the submodules from there.
However, I still can't access that commit with git show, only if I navigate inside the submodule folder.
And when I call CodeChurn or a DMM metric, it still fails because Pydriller stays in the root folder.
@ishepard Here is the test script I put together if you want to try it out yourself. I'm running Python 3.8 and Pydriller 2.4.1
import subprocess
import tempfile
import os
from typing import List
from pydriller import Repository, Git
tmp_dir = tempfile.mkdtemp()
repo_dir = os.path.join(tmp_dir, "avatarify-python")
process = subprocess.run(["git", "clone", "https://github.com/alievk/avatarify-python"],
stdout=subprocess.PIPE,
cwd=tmp_dir)
process = subprocess.run(["git", "checkout", "master"],
stdout=subprocess.PIPE,
cwd=repo_dir)
git: Git = Git(repo_dir)
gitmodules_hist: List[str] = git.get_commits_modified_file(os.path.join(repo_dir, ".gitmodules"), include_deleted_files=True)
for hash in gitmodules_hist:
git.checkout(hash)
if os.path.exists(os.path.join(repo_dir, ".gitmodules")):
print("SUBMODULE UPDATE")
process = subprocess.run(["git", "submodule", "init"],
stdout=subprocess.PIPE,
cwd=repo_dir)
process = subprocess.run(["git", "submodule", "update"],
stdout=subprocess.PIPE,
cwd=repo_dir)
git_commits = Repository(repo_dir, only_no_merge=True).traverse_commits()
commits = []
for git_commit in git_commits:
if git_commit.hash == "80226c1717402f7372a9f82b098619b3836b8bc0":
print("FOUND BEFORE SUBMODULE 1")
# Fails here because 80226c references 72a32a
print(git_commit.dmm_unit_size)
elif git_commit.hash == "72a32a67dee3a67dff76f565551907a2fc7e88e6":
print("FOUND SUBMODULE 1")
elif git_commit.hash == "a5aabda05cc0d0da1e21f21a138e2e5dec01afa0":
print("FOUND BEFORE SUBMODULE 2")
# Fails here because a5aabd references 6c1fbf
print(git_commit.dmm_unit_size)
elif git_commit.hash == "6c1fbf39690130e2303bcecd3c6126c71cfacf85":
print("FOUND SUBMODULE 2")
Describe the bug I'm analyzing the github repo avatarify and the commits containing submodule commits such as this one causes an exception to be raised:
ValueError: SHA b'72a32a67dee3a67dff76f565551907a2fc7e88e6' could not be resolved, git returned: b'72a32a67dee3a67dff76f565551907a2fc7e88e6missing'
The hash in the error being the one of the submodule commit.To Reproduce I've noticed this issue on 2 occurrences while working with avatarify:
When I use
commits = pydriller.Repository(...).traverse_commits()
and retrieve either of dmm_unit_size/dmm_unit_complexity/dmm_unit_interfacing:This is straightforward to patch on my side as I can just try-catch these metrics and replacing them by None if it fails on a commit. However the second case would require a change out of my reach.
When I call the constructor of
pydriller.metrics.process.code_churn.CodeChurn
Unless I avoid the problematic commits by navigating with CodeChurn's from_commit/to_commit around them, I simply cannot compute the repo's churn
OS Version: Windows