jxaa152677 / git-repo

Automatically exported from code.google.com/p/git-repo
Apache License 2.0
0 stars 0 forks source link

repo sync uses non-thread-safe subprocess from threads #215

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Affected Version: 1.12.23
Environment: Ubuntu 12.04/Python 2.7.3

What steps will reproduce the problem?
We run repo sync automatically hourly from crontab to keep mirrors on ~20 
worldwide servers up-to-date (all jobs are run at different minutes past the 
hour, so we don't hit the central gerrit server from all slaves 
simultaneously).  Every few days some number of the 'repo sync's hang - 
sometimes it is just one or two, and sometimes it is most of them.  When this 
happens I see several defunct processes on the slaves, for example:

rdkadmin 24781 24553  0 Nov24 ?        00:00:04             /usr/bin/python 
/var/arris/mirrors/RDK/.re\
po/repo/main.py --repo-dir=/var/arris/mirrors/RDK/.repo --wrapper-version=1.22 
--wrapper-path=/var/arr\
is/tools/ArrisDVS/global/bin/repo -- sync --jobs=1
rdkadmin 24783 24781  0 Nov24 ?        00:00:00               [ssh] <defunct>
rdkadmin 24792 24781  0 Nov24 ?        00:00:00               ssh -M -N -p 
29418 -o ControlPath /tmp/s\
sh-Dm7xrC/master-%r@%h:%p lmardk1.arrisi.com
rdkadmin 24930 24781  0 Nov24 ?        00:00:00               ssh -M -N -p 
29418 -o ControlPath /tmp/s\
sh-Dm7xrC/master-%r@%h:%p vtr.arrisi.com
rdkadmin 26568 24781  0 Nov24 ?        00:00:00               [ssh] <defunct>
rdkadmin 16930 24781  0 Nov24 ?        00:00:00               [git] <defunct>

...and another example:

rdkadmin 29368 29334  0 Nov24 ?        00:00:04             /usr/bin/python 
/export/arris/mirrors/RDK/\
.repo/repo/main.py --repo-dir=/export/arris/mirrors/RDK/.repo 
--wrapper-version=1.22 --wrapper-path=/e\
xport/arris/tools/ArrisDVS/global/bin/repo -- sync --jobs=1
rdkadmin 29373 29368  0 Nov24 ?        00:00:00               ssh -M -N -p 
29418 -o ControlPath /tmp/s\
sh-stuWgc/master-%r@%h:%p gerrit.arrisi.com
rdkadmin 29382 29368  0 Nov24 ?        00:00:00               [ssh] <defunct>
rdkadmin 29556 29368  0 Nov24 ?        00:00:00               ssh -M -N -p 
29418 -o ControlPath /tmp/s\
sh-stuWgc/master-%r@%h:%p vtr.arrisi.com
rdkadmin 30944 29368  0 Nov24 ?        00:00:00               [git] <defunct>

Note that both of these examples were collected on November 25 - the processes 
have been sitting there since last night.

What is the expected output? What do you see instead?
I expect the 'repo sync' to complete without hanging :-)

Please provide any additional information below.
The subprocess module in Python 2.7 is not thread-safe 
(http://stackoverflow.com/questions/21194380/is-subprocess-popen-not-thread-safe
), but repo is calling subprocess methods from threads.  There is a python 
module named subprocess32 which is a backport of subprocess from Python 3.2 to 
2.x - I am about to set up some of my systems to use subprocess32 to replace 
subprocess, in the hopes (?) that over the Thanksgiving weekend at least some 
of the systems will get into the hung state.

One odd thing is that I never see this hang when the repo sync is run 
interactively - it only happens when run from a crontab.  It is also odd that 
nobody else has seen this problem, but maybe nobody else runs 'repo sync' from 
a crontab.

Original issue reported on code.google.com by lezzgi...@gmail.com on 25 Nov 2015 at 3:38

GoogleCodeExporter commented 8 years ago
Using subprocess32 didn't help, but I'm not 100% convinced that the version of 
git-repo that ran from the crontab actually used it - there are different 
versions of git-repo in different locations and it can be difficult to make 
sure that the expected version is running.  The fact remains that subprocess is 
not thread-safe, and methods should not be called from a thread.

Original comment by lezzgi...@gmail.com on 17 Dec 2015 at 3:45