mazen160 / GithubCloner

A script that clones Github repositories of users and organizations.
MIT License
400 stars 122 forks source link

Inconsistent results for same user/org #10

Closed danielhoherd closed 5 years ago

danielhoherd commented 6 years ago

I have seen several instances of inconsistent behavior when cloning the same user or org. I observed this while iterating on the API rate limit exceeded and the incomplete repository issues.

Here is the cycle I am using to test iterations:

cd
rm -rf "${HOME}/gitclone-test-"* "${HOME}/.mrconfig"
"${HOME}/code/GithubCloner/githubcloner.py" --user mazen160 -o "${HOME}/gitclone-test-$(date +%s)/"
find "${HOME}/gitclone-test"* -maxdepth 1 -mindepth 1 -type d | xargs -n1 mr register
mr status

I chose mazen160 for this example because it is a public data set within the author's control, but I have observed it with both organizations and other users.

When iterating with the above 7 times, I observed these results:

5  https://github.com/mazen160/Firefox-Security-Toolkit.git
5  https://github.com/mazen160/SecLists.git
5  https://github.com/mazen160/bfac.git
5  https://github.com/mazen160/ct-monitor.git
5  https://github.com/mazen160/struts-pwn_CVE-2017-9805.git
6  https://github.com/mazen160/Ubuntu-Desktop-Malware-Vector-Demo.git
6  https://github.com/mazen160/dirsearch.git
6  https://github.com/mazen160/dnsrecon.git
6  https://github.com/mazen160/public.git
6  https://github.com/mazen160/server-status_PWN.git
7  https://github.com/mazen160/GithubCloner.git
7  https://github.com/mazen160/ptf.git
7  https://github.com/mazen160/struts-pwn.git
mazen160 commented 6 years ago

I think this is happening because of the gitpython module. Do you have any thoughts why it could be happening?

danielhoherd commented 6 years ago

I am not sure why yet, but I am looking into it. I have a fork/branch here with some extra logging where I am investigating: https://github.com/danielhoherd/GithubCloner/tree/logging

Run it with export LOGLEVEL=DEBUG ; githubcloner.py

danielhoherd commented 6 years ago

It looks like the problems are happening in git. I don't know if the root cause is in git or in the threading. For some unknown reason I cannot get GIT_PYTHON_TRACE to work, but I suspect it's because of threading.

mazen160 commented 6 years ago

I believe the same, maybe let's try to check it via a separate process on each thread?

mazen160 commented 6 years ago

I am testing the issue again, I will let you know the results.

mazen160 commented 6 years ago

Hi @danielhoherd

As discussed on https://github.com/mazen160/GithubCloner/issues/9, the issue is occuring because of git. It's not githubcloner related, probably the best fix it to slow down cloning to 1 thread at github cloner + to report the issue to git devs.

danielhoherd commented 5 years ago

Closing issue because the above answer is sufficient.