Open GoogleCodeExporter opened 9 years ago
Thanks for the report, Mathieu. Does this happen every time? Could you please
post your fabricate build file? (Or even better, a small, stand-alone build
file that reproduces this problem?)
Original comment by benh...@gmail.com
on 1 Dec 2011 at 2:33
As noted by the OP the problem is that the vfork() which creates process 15151
is interrupted by an open() running as 15151 but since the vfork was
interrupted the entry in the process dict has not been created.
ATM fabricate only creates processes on completion of fork/clone or exec but
POSIX allows some syscalls between the fork and the exec which run as the new
pid.
The solution would be to look at the pid before anything else and create the
process dict entry irrespective of the call type, or more correctly but
complicated is to defer processing of interrupting calls until the interrupted
one is finished.
Original comment by ele...@gmail.com
on 1 Dec 2011 at 11:10
I guess that creating the dict entry irrespective of the call type will do the
trick…
I'll give a try
Original comment by mathieu....@gmail.com
on 2 Dec 2011 at 6:44
Did doing that solve the problem, Mathieu? I haven't had a chance to play with
this yet (I don't run Linux myself). Can we repro this with a small C program
that uses fork and open, which fabricate would run?
Original comment by benh...@gmail.com
on 18 Dec 2011 at 7:12
It seems to solve the problem (I'll try to set up a test case as soon I have
some spare time), but I stumbled into another one, where I have the two
consecutive line in the strace output :
28033 <... fork resumed> ) = 3
28033 <... open resumed> ) = 0
Leading to a key error at line "line = unfinished[pid] + body"
It is very strange as I have only one "unfinished" line for process 28033,
which is an "open" one (no "fork" one) !
In another attempt, where I ignore "fork resumed" lines, I get a strace with
only one "open unfinished" line for process 32491 but got *two* subsequent
"open resumed" for this same process !
I was trying to improve unfinished treatment, but now I do not understand what
is the meaning of such strace ouptut…
Original comment by mathieu....@gmail.com
on 29 Dec 2011 at 3:26
Hi Mathieu,
Could you post with the whole relevant section of the strace.
Thanks
Lex
Original comment by ele...@gmail.com
on 29 Dec 2011 at 10:39
Of course.
Here are two samples of two consecutive runs :
24250 clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f3da39a09d0) = 24253
24250 clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f3da39a09d0) = 24254
24253 execve("/lib/cpp", ["/lib/cpp", "-traditional", "-P", "-C", "-undef"],
[/* 69 vars */]) = 0
24253 open("/etc/ld.so.cache", O_RDONLY) = 3
24253 open("/lib/libc.so.6", O_RDONLY) = 3
24254 open("/tmp/File73712902416211553.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
24252
execve("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/v
alidation/runtime/bpp/bpp", ["bpp", "/home/clabaut/c478/c478_traducte"...], [/*
69 vars */] <unfinished ...>
24254 execve("/bin/sed", ["sed", "-e", "s,\\\\#,#,g", "-e",
"s,\\\\\\\\,\\\\,g"], [/* 69 vars */]) = 0
24252 <... execve resumed> ) = 0
24252 open("/etc/ld.so.cache", O_RDONLY) = 3
24254 open("/etc/ld.so.cache", O_RDONLY) = 3
) = 31768576
24254 open("/lib/libacl.so.1", O_RDONLY <unfinished ...>
24253 open("/usr/lib/locale/locale-archive", O_RDONLY <unfinished ...>
24254 <... open resumed> ) = 3
24253 <... open resumed> ) = 3
24252 open("/lib32/libc.so.6", O_RDONLY <unfinished ...>
24253 <... open resumed> ) = 0
And
23684 clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f795a5e69d0) = 23688
23687 execve("/lib/cpp", ["/lib/cpp", "-traditional", "-P", "-C", "-undef"],
[/* 69 vars */]) = 0
23687 open("/etc/ld.so.cache", O_RDONLY) = 3
23687 open("/lib/libc.so.6", O_RDONLY) = 3
23686
execve("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/v
alidation/runtime/bpp/bpp", ["bpp", "/home/clabaut/c478/c478_traducte"...], [/*
69 vars */] <unfinished ...>
23688 open("/tmp/File7942459553864738579.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666)
= 3
23688 execve("/bin/sed", ["sed", "-e", "s,\\\\#,#,g", "-e",
"s,\\\\\\\\,\\\\,g"], [/* 69 vars */] <unfinished ...>
23686 <... execve resumed> ) = 0
23688 <... execve resumed> ) = 0
23687 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
) = 0
23688 open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
23686 open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
23688 <... fork resumed> ) = 3
23686 <... open resumed> ) = 3
23688 open("/lib/libacl.so.1", O_RDONLY <unfinished ...>
23687 open("/usr/share/locale/locale.alias", O_RDONLY <unfinished ...>
23688 <... open resumed> ) = 3
23687 <... open resumed> ) = 3
23686 open("/lib32/libc.so.6", O_RDONLY <unfinished ...>
23688 <... open resumed> ) = 0
23688 open("/lib/libc.so.6", O_RDONLY) = 3
23687 open("/usr/share/locale/fr_FR.UTF-8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
23687 open("/usr/share/locale/fr_FR.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
23687 open("/usr/share/locale/fr_FR/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 ENOENT
(No such file or directory)
23687 open("/usr/share/locale/fr.UTF-8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
23687 open("/usr/share/locale/fr.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
23688 open("/lib/libattr.so.1", O_RDONLY <unfinished ...>
23687 open("/usr/share/locale/fr/LC_MESSAGES/gcc.mo", O_RDONLY) = 3
23688 <... fork resumed> ) = 3
23688 <... open resumed> ) = 0
23686
open("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/val
idation/tests/TC3/in/m_typ.mch", O_RDONLY <unfinished ...>
23688 <... execve resumed> ) = 0
In the first one, there is only one "open…unfinished" for two
"open…resumed"…
Original comment by mathieu....@gmail.com
on 30 Dec 2011 at 7:43
Hi Mathieu,
Somehow a reply where I requested what operating systen, version and strace
version you were using did not get attached to the issue.
Original comment by ele...@gmail.com
on 16 Jan 2012 at 10:13
Hi Lex,
My operating system is :
Linux 3.1.9-2-ARCH #1 SMP PREEMPT Sat Jan 14 09:11:37 CET 2012 x86_64 Intel(R)
Core(TM) i5-2520M CPU @ 2.50GHz GenuineIntel GNU/Linux
strace version is : 4.6
Best regards,
-Mathieu
Original comment by mathieu....@gmail.com
on 20 Jan 2012 at 11:52
Thanks Mathieu,
It apears to be a problem with strace, if you are happy for me to submit the
two strace fragments to that project please provide them as an attachment so we
know that the comment and mail system hasn't wrapped or otherwise modified
them. (there are some strange artifacts that suggest they have been modified in
transit).
Cheers
Lex
Original comment by ele...@gmail.com
on 20 Jan 2012 at 11:34
Mathieu,
As a postscript, if you can back your strace version to 4.5 and confirm the
problem does not occur (I'm on 4.5.20 and havn't seen any problems) that is
important information for the strace folks.
Cheers
Lex
Original comment by ele...@gmail.com
on 21 Jan 2012 at 12:20
I am starting work on fixing this issue. The suggested fix of just creating the
process dict entry obviously works for mathieu, but it does lose some of the
functionality of a proper clone_match in the StraceRunner. That being to
transfer the cwd to the new process dict entry.
I also noticed that the execve_match basically does the suggested fix of just
creating the process dict entry if it does not exist. This is also missing the
transfer of the cwd to the new process dict entry. I am surprised this has not
caused any issues. My assumption is that the original reason for this was to
pick up the initial execve that kicks off the very first process traced, not to
fix the interrupted fork.
I think the correct fix is to delay any matching until the clone has completed.
I think this should be simple enough to do. Just create the StraceProcess
object for the pid if it does not exist, but mark is as "not fully cloned". Any
lines that match before the clone completes can be cached in the process entry.
When the clone completes the lines can be re-processed and correct deps
extracted. For execve there will have to be a special case to create a process
dict entry if there are no processes being tracked. Does any one see any
problems with this approach?
Original comment by simon.al...@gmail.com
on 3 May 2013 at 3:11
Seeing this issue during django-manage makemessages (basically gettext).
Original comment by wer...@beroux.com
on 7 May 2014 at 11:19
Are you still working on this from time to time?
Original comment by wer...@beroux.com
on 13 Aug 2014 at 9:49
Original issue reported on code.google.com by
mathieu....@gmail.com
on 29 Nov 2011 at 2:42