Closed gotec closed 1 year ago
I had a quick look, it seems that it is happening only when the process is interrupted while being in a call to the Git lib, which spawns a subprocess to run the git command (as opposed to using pygit2 which relies directly on the libgit2 C lib, if I understood well). What is written to stderr is inherited from subprocess.Popen (Exception ignored in: <function Git.__del__ at 0x...>
).
Track 1 for solution: It is possible through the args of git.cmd.Git.execute() method to pass args to subprocess.Popen (see https://gitpython.readthedocs.io/en/stable/reference.html#git.cmd.Git.execute ), and in particular how to redirect stderr (potentially to devnull), but you do not want to silence any error either. Maybe piping it into an 'intermediate stderr' (repiped into the main) silenced if the stream starts with 'Exception ignored'?
Track 2 for solution: There is a kill_after_timeout arg in git.cmd.Git.execute as well, but they indicate it might harm the state of the repo, and you would have to track the timeout value for each call, using e.g. time.time() compared to the start of the global process and the initial timeout value (as opposed to the alarm where it s just one sleep), and let a bit of room for the timeout processed to go through before the alarm triggers the global timeout behavior.
TBN: You may want to print the KeyboardInterrupt and not a time out message if it is an actual KeyboardInterrupt, which happens in parallel mode but in serial mode it would be treated as the end of the alarm (I did not check). This would be another issue but it is related.
NB: For easy reproduction of the issue (with no_of_processes so that one can test easily also the serial case):
import git2net
import os
import shutil
github_url = 'gotec/git2net'
git_repo_dir = 'git2net_test'
sqlite_db_file = 'git2net.db'
git2net.mine_github(github_url, git_repo_dir, sqlite_db_file,no_of_processes=2,timeout=0.001)
Hi William
Thanks a lot for reminding me of this!
Redirecting the stderr via an argument of git.cmd.Git.execute()
to subprocess.Popen()
is unfortunately not possible, as it is automatically set by git.cmd.Git.execute()
and cannot be overwritten.
After some exploration, I found a way to achieve the same using contextlib.redirect_stderr()
. While doing so, I also replaced the previous timeout implementation using the ThreadingTimeout
context manager from stopit
. As you also suggested, I then parsed the redirected stderr manually, raising it as an exception if it is not an ignored exception.
The current solution has many benefits over the old code, including:
KeyboardInterrupt
to be differentiated from a TimeoutException
.I have committed these changes with 9c88c8c211a98641a88248a6fb6abd7f92ac04ff, so feel free to try them out! From my understanding, they should work on all platforms (i.e., retaining Windows compatibility (which was the reason why I switched from signals to threads in 8a5d405bc8263b6f482f9e6c38534023de8956e5), however I still have to validate this.
Cheers Christoph
Cool solution, thanks! I tested with the example above, I confirm that I get the expected behavior (for all timeouts and for the KeyboardInterrupt), under Linux.
Describe the bug KeyboardInterrupt error is output while the timout option is used.
Expected behavior A message
Timeout processing commit: <hash>
is output but no KeyboardInterrupt is printed.