softwaresaved / rse-repo-analysis

Study of research software in repositories. Contact: @karacolada
BSD 3-Clause "New" or "Revised" License
11 stars 0 forks source link

Consider README history memory #26

Open karacolada opened 1 year ago

karacolada commented 1 year ago

Some repositories yield a memory error and cannot be parsed. I think they are cloned locally and that results in the memory error.

Example:

[WARNING] Executing query_readme_history with arguments (github_user_cleaned_url    cylammarco/ASPIRED-example
readme_path                                 README.md
Name: 97, dtype: object, 'github_user_cleaned_url', <github.MainClass.Github object at 0x7f88e31295d0>) failed:
Traceback (most recent call last):
File "/home/eidf048/eidf048/kmoraw/rse-repo-analysis/github/utils.py", line 15, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/eidf048/eidf048/kmoraw/rse-repo-analysis/github/crawl_contents.py", line 29, in query_readme_history for commit in repo_readme.traverse_commits():
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/pydriller/repository.py", line 213, in traverse_commits with self._prep_repo(path_repo=path_repo) as git: 
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/contextlib.py", line 137, in __enter__
return next(self.gen)                                                                                                                                               ^^^^^^^^^^^^^^
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/pydriller/repository.py", line 177, in _prep_repo
local_path_repo = self._clone_remote_repo(self._clone_folder(), path_repo)                                                                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/pydriller/repository.py", line 158, in _clone_remote_repo
Repo.clone_from(url=repo, to_path=repo_folder)
 File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/git/repo/base.py", line 1308, in clone_from
return cls._clone(                                                                                                                                                  ^^^^^^^^^^^
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/git/repo/base.py", line 1219, in _clone
finalize_process(proc, stderr=stderr) 
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/git/util.py", line 419, in finalize_process                  
proc.wait(**kwargs)
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/git/cmd.py", line 604, in wait
raise GitCommandError(remove_password_if_present(self.args), status, errstr)                                                                             git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git clone -v -- https://github.com/cylammarco/ASPIRED-example /tmp/tmp3o7vzcbo/ASPIRED-example
stderr: 'Cloning into '/tmp/tmp3o7vzcbo/ASPIRED-example'...
POST git-upload-pack (227 bytes)
fatal: packfile /tmp/tmp3o7vzcbo/ASPIRED-example/.git/objects/pack/pack-daffd17f5b2a5c44a37a64bb20100e9ef7b761cc.pack cannot be mapped: Cannot allocate memory
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

'

Exception in thread Thread-3895 (pump_stream):
Traceback (most recent call last):
File "/home/eidf048/eidf048/kmoraw/tools/miniconda3/envs/sw_mentions/lib/python3.11/site-packages/git/cmd.py", line 141, in pump_stream
handler(line)
MemoryError 
karacolada commented 3 months ago

Looking at the analysis results, this didn't happen very often. It should be addressed in future iterations, but doesn't need fixing at the moment.