jfgailleur / fabricate

Automatically exported from code.google.com/p/fabricate
0 stars 0 forks source link

Traceback when seeing new process syscall (vfork processed ? ) #29

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,

I get the following traceback on a quite simple fabric file.
Strangely enough, on the corresponding strace file, I get an "open" line before 
the "execve" one…

Traceback (most recent call last):
  File "./fabric.py", line 22, in <module>
    main()
  File "/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/validation/tests/fabricate.py", line 1402, in main
    this_status = eval(action, globals_dict)
  File "<string>", line 1, in <module>
  File "./fabric.py", line 8, in build
    gen_source(1, s)
  File "./fabric.py", line 17, in gen_source
    run(VALIDSH, str(chain), '--gen', scenario)
  File "/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/validation/tests/fabricate.py", line 1233, in run
    return default_builder.run(*args, **kwargs)
  File "/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/validation/tests/fabricate.py", line 998, in run
    deps, outputs = self.runner(*arglist, **kwargs)
  File "/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/validation/tests/fabricate.py", line 689, in __call__
    return self._runner(*args, **kwargs)
  File "/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/validation/tests/fabricate.py", line 644, in __call__
    status, deps, outputs = self._do_strace(args, kwargs, outfile, outname)
  File "/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/validation/tests/fabricate.py", line 585, in _do_strace
    cwd = processes[pid].cwd
KeyError: '15151'

Here is the relevant portion of the strace file (with first lines where 15151 
pid appears)

15138 vfork( <unfinished ...>
15151 open("/proc/self/fd", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
15151 
execve("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/v
alidation/bin/bpp.sh", ["bpp.sh", "/home/clabaut/c478/c478_traducte"..., 
"/tmp/File2606627264771462039.tmp"], [/* 67 vars */]) = -1 ENOENT (No such file 
or directory)
15151 
execve("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/v
alidation/runtime/bpp/bpp.sh", ["bpp.sh", 
"/home/clabaut/c478/c478_traducte"..., "/tmp/File2606627264771462039.tmp"], [/* 
67 vars */]) = -1 ENOEXEC (Exec format error)
15151 execve("/bin/sh", ["/bin/sh", "/home/clabaut/c478/c478_traducte"..., 
"/home/clabaut/c478/c478_traducte"..., "/tmp/File2606627264771462039.tmp"], [/* 
67 vars */] <unfinished ...>
15138 <... vfork resumed> )             = 15151
15151 <... execve resumed> )            = 0
15151 open("/etc/ld.so.cache", O_RDONLY) = 3
15151 open("/lib/libreadline.so.6", O_RDONLY) = 3
15151 open("/lib/libncursesw.so.5", O_RDONLY) = 3
15151 open("/lib/libdl.so.2", O_RDONLY) = 3
15151 open("/lib/libc.so.6", O_RDONLY)  = 3
15138 clone(child_stack=0x7f10de33aff0, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLO
NE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, 
parent_tidptr=0x7f10de33b9d0, tls=0x7f10de33b700, child_tidptr=0x7f10de33b9d0) 
= 15152
15151 open("/dev/tty", O_RDWR|O_NONBLOCK) = 3
15151 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
15151 open("/proc/meminfo", O_RDONLY)   = 3
15151 open("/usr/lib/gconv/gconv-modules.cache", O_RDONLY) = -1 ENOENT (No such 
file or directory)
15151 open("/usr/lib/gconv/gconv-modules", O_RDONLY) = 3
15151 
open("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/val
idation/runtime/bpp/bpp.sh", O_RDONLY) = 3
15151 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f8d2392c9d0) = 15153
15153 execve("/bin/uname", ["uname", "-s"], [/* 66 vars */]) = 0
15153 open("/etc/ld.so.cache", O_RDONLY) = 3
15153 open("/lib/libc.so.6", O_RDONLY)  = 3
15153 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
15153 exit_group(0)                     = ?
15151 --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15153, si_status=0, 
si_utime=0, si_stime=0} (Child exited) ---
15151 open("/tmp/ttt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
15151 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f8d2392c9d0) = 15155
15151 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f8d2392c9d0) = 15156
15151 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f8d2392c9d0) = 15157
15156 execve("/lib/cpp", ["/lib/cpp", "-traditional", "-P", "-C", "-undef"], 
[/* 66 vars */]) = 0
15156 open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
15155 
execve("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/v
alidation/runtime/bpp/bpp", ["bpp", "/home/clabaut/c478/c478_traducte"...], [/* 
66 vars */] <unfinished ...>
15156 <... open resumed> )              = 3
15156 open("/lib/libc.so.6", O_RDONLY)  = 3
15155 <... execve resumed> )            = 0
15155 open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
15157 open("/tmp/File2606627264771462039.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666) 
= 3
) = 0
15157 execve("/bin/sed", ["sed", "-e", "s,\\\\#,#,g", "-e", 
"s,\\\\\\\\,\\\\,g"], [/* 66 vars */] <unfinished ...>
15155 open("/lib32/libc.so.6", O_RDONLY) = 3
15156 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
15157 <... execve resumed> )            = 0
15156 open("/usr/share/locale/locale.alias", O_RDONLY) = 3
15157 open("/etc/ld.so.cache", O_RDONLY) = 3
) = 0
15157 open("/lib/libacl.so.1", O_RDONLY) = 3
15156 open("/usr/share/locale/fr_FR.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
15156 open("/usr/share/locale/fr_FR/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 ENOENT 
(No such file or directory)
15156 open("/usr/share/locale/fr.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
15156 open("/usr/share/locale/fr/LC_MESSAGES/gcc.mo", O_RDONLY) = 3
15157 open("/lib/libc.so.6", O_RDONLY)  = 3
15155 
open("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/val
idation/tests/RT7_2/in/m1_i.imp", O_RDONLY <unfinished ...>
15157 open("/lib/libattr.so.1", O_RDONLY) = 3
15155 exit_group(0)                     = ?
15157 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
15157 open("/usr/lib/charset.alias", O_RDONLY) = -1 ENOENT (No such file or 
directory)
15157 open("/usr/lib/gconv/gconv-modules.cache", O_RDONLY) = -1 ENOENT (No such 
file or directory)
15157 open("/usr/lib/gconv/gconv-modules", O_RDONLY) = 3
15156 vfork( <unfinished ...>
15161 execve("/usr/lib/gcc/x86_64-unknown-linux-gnu/4.6.2/cc1", 
["/usr/lib/gcc/x86_64-unknown-linu"..., "-E", "-traditional-cpp", "-quiet", 
"-C", "-P", "-", "-mtune=generic", "-march=x86-64", "-undef"], [/* 69 vars */] 
<unfinished ...>
15156 <... vfork resumed> )             = 15161
15161 <... execve resumed> )            = 0
15161 open("/etc/ld.so.cache", O_RDONLY) = 3
15161 open("/usr/lib/libcloog-isl.so.2", O_RDONLY) = 3
15161 open("/usr/lib/libisl.so.7", O_RDONLY) = 3
15161 open("/usr/lib/libppl_c.so.4", O_RDONLY) = 3
15161 open("/usr/lib/libppl.so.9", O_RDONLY) = 3
15161 open("/usr/lib/libpwl.so.5", O_RDONLY) = 3
15161 open("/usr/lib/libgmpxx.so.4", O_RDONLY) = 3
15161 open("/usr/lib/libmpc.so.2", O_RDONLY) = 3
15161 open("/usr/lib/libmpfr.so.4", O_RDONLY) = 3
15161 open("/usr/lib/libgmp.so.10", O_RDONLY) = 3
15161 open("/lib/libdl.so.2", O_RDONLY) = 3
15161 open("/usr/lib/libz.so.1", O_RDONLY) = 3
15161 open("/lib/libc.so.6", O_RDONLY)  = 3
15161 open("/usr/lib/libstdc++.so.6", O_RDONLY) = 3
15161 open("/lib/libm.so.6", O_RDONLY)  = 3
15161 open("/usr/lib/libgcc_s.so.1", O_RDONLY) = 3
15161 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
15161 open("/usr/share/locale/locale.alias", O_RDONLY) = 3
15161 open("/usr/share/locale/fr_FR.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
15161 open("/usr/share/locale/fr_FR/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 ENOENT 
(No such file or directory)
15161 open("/usr/share/locale/fr.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
15161 open("/usr/share/locale/fr/LC_MESSAGES/gcc.mo", O_RDONLY) = 3
15161 open("/proc/meminfo", O_RDONLY)   = 3
15161 open("/proc/meminfo", O_RDONLY)   = 3
15161 open("/proc/meminfo", O_RDONLY)   = 3
15161 open("/usr/lib/gconv/gconv-modules.cache", O_RDONLY) = -1 ENOENT (No such 
file or directory)
15161 open("/usr/lib/gconv/gconv-modules", O_RDONLY) = 3
15161 open("/usr/lib/gconv/ISO8859-1.so", O_RDONLY) = 3
15161 exit_group(0)                     = ?
15156 --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15161, si_status=0, 
si_utime=0, si_stime=0} (Child exited) ---
15156 exit_group(0)                     = ?
15157 exit_group(0)                     = ?
15151 --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15155, si_status=0, 
si_utime=0, si_stime=0} (Child exited) ---
15151 exit_group(0)                     = ?
15152 --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15151, si_status=0, 
si_utime=0, si_stime=0} (Child exited) ---

Original issue reported on code.google.com by mathieu....@gmail.com on 29 Nov 2011 at 2:42

GoogleCodeExporter commented 9 years ago
Thanks for the report, Mathieu. Does this happen every time? Could you please 
post your fabricate build file? (Or even better, a small, stand-alone build 
file that reproduces this problem?)

Original comment by benh...@gmail.com on 1 Dec 2011 at 2:33

GoogleCodeExporter commented 9 years ago
As noted by the OP the problem is that the vfork() which creates process 15151 
is interrupted by an open() running as 15151 but since the vfork was 
interrupted the entry in the process dict has not been created.

ATM fabricate only creates processes on completion of fork/clone or exec but 
POSIX allows some syscalls between the fork and the exec which run as the new 
pid.

The solution would be to look at the pid before anything else and create the 
process dict entry irrespective of the call type, or more correctly but 
complicated is to defer processing of interrupting calls until the interrupted 
one is finished.

Original comment by ele...@gmail.com on 1 Dec 2011 at 11:10

GoogleCodeExporter commented 9 years ago
I guess that creating the dict entry irrespective of the call type will do the 
trick…
I'll give a try

Original comment by mathieu....@gmail.com on 2 Dec 2011 at 6:44

GoogleCodeExporter commented 9 years ago
Did doing that solve the problem, Mathieu? I haven't had a chance to play with 
this yet (I don't run Linux myself). Can we repro this with a small C program 
that uses fork and open, which fabricate would run?

Original comment by benh...@gmail.com on 18 Dec 2011 at 7:12

GoogleCodeExporter commented 9 years ago
It seems to solve the problem (I'll try to set up a test case as soon I have 
some spare time), but I stumbled into another one, where I have the two 
consecutive line in the strace output :

28033 <... fork resumed> )              = 3
28033 <... open resumed> )              = 0

Leading to a key error at line "line = unfinished[pid] + body"

It is very strange as I have only one "unfinished" line for process 28033, 
which is an "open" one (no "fork" one) !

In another attempt, where I ignore "fork resumed" lines,  I get a strace with 
only one "open unfinished" line for process 32491 but got *two* subsequent 
"open resumed" for this same process !
I was trying to improve unfinished treatment, but now I do not understand what 
is the meaning of such strace ouptut… 

Original comment by mathieu....@gmail.com on 29 Dec 2011 at 3:26

GoogleCodeExporter commented 9 years ago
Hi Mathieu,

Could you post with the whole relevant section of the strace.

Thanks
Lex

Original comment by ele...@gmail.com on 29 Dec 2011 at 10:39

GoogleCodeExporter commented 9 years ago
Of course.
Here are two samples of two consecutive runs :

24250 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f3da39a09d0) = 24253
24250 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f3da39a09d0) = 24254
24253 execve("/lib/cpp", ["/lib/cpp", "-traditional", "-P", "-C", "-undef"], 
[/* 69 vars */]) = 0
24253 open("/etc/ld.so.cache", O_RDONLY) = 3
24253 open("/lib/libc.so.6", O_RDONLY)  = 3
24254 open("/tmp/File73712902416211553.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
24252 
execve("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/v
alidation/runtime/bpp/bpp", ["bpp", "/home/clabaut/c478/c478_traducte"...], [/* 
69 vars */] <unfinished ...>
24254 execve("/bin/sed", ["sed", "-e", "s,\\\\#,#,g", "-e", 
"s,\\\\\\\\,\\\\,g"], [/* 69 vars */]) = 0
24252 <... execve resumed> )            = 0
24252 open("/etc/ld.so.cache", O_RDONLY) = 3
24254 open("/etc/ld.so.cache", O_RDONLY) = 3
) = 31768576
24254 open("/lib/libacl.so.1", O_RDONLY <unfinished ...>
24253 open("/usr/lib/locale/locale-archive", O_RDONLY <unfinished ...>
24254 <... open resumed> )              = 3
24253 <... open resumed> )              = 3
24252 open("/lib32/libc.so.6", O_RDONLY <unfinished ...>
24253 <... open resumed> )              = 0

And
23684 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f795a5e69d0) = 23688
23687 execve("/lib/cpp", ["/lib/cpp", "-traditional", "-P", "-C", "-undef"], 
[/* 69 vars */]) = 0
23687 open("/etc/ld.so.cache", O_RDONLY) = 3
23687 open("/lib/libc.so.6", O_RDONLY)  = 3
23686 
execve("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/v
alidation/runtime/bpp/bpp", ["bpp", "/home/clabaut/c478/c478_traducte"...], [/* 
69 vars */] <unfinished ...>
23688 open("/tmp/File7942459553864738579.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666) 
= 3
23688 execve("/bin/sed", ["sed", "-e", "s,\\\\#,#,g", "-e", 
"s,\\\\\\\\,\\\\,g"], [/* 69 vars */] <unfinished ...>
23686 <... execve resumed> )            = 0
23688 <... execve resumed> )            = 0
23687 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
) = 0
23688 open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
23686 open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
23688 <... fork resumed> )              = 3
23686 <... open resumed> )              = 3
23688 open("/lib/libacl.so.1", O_RDONLY <unfinished ...>
23687 open("/usr/share/locale/locale.alias", O_RDONLY <unfinished ...>
23688 <... open resumed> )              = 3
23687 <... open resumed> )              = 3
23686 open("/lib32/libc.so.6", O_RDONLY <unfinished ...>
23688 <... open resumed> )              = 0
23688 open("/lib/libc.so.6", O_RDONLY)  = 3
23687 open("/usr/share/locale/fr_FR.UTF-8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
23687 open("/usr/share/locale/fr_FR.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
23687 open("/usr/share/locale/fr_FR/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 ENOENT 
(No such file or directory)
23687 open("/usr/share/locale/fr.UTF-8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
23687 open("/usr/share/locale/fr.utf8/LC_MESSAGES/gcc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
23688 open("/lib/libattr.so.1", O_RDONLY <unfinished ...>
23687 open("/usr/share/locale/fr/LC_MESSAGES/gcc.mo", O_RDONLY) = 3
23688 <... fork resumed> )              = 3
23688 <... open resumed> )              = 0
23686 
open("/home/clabaut/c478/c478_traducteur_java_hgsvn/traducteurBversCJavaTest/val
idation/tests/TC3/in/m_typ.mch", O_RDONLY <unfinished ...>
23688 <... execve resumed> )            = 0

In the first one, there is only one "open…unfinished" for two 
"open…resumed"…

Original comment by mathieu....@gmail.com on 30 Dec 2011 at 7:43

GoogleCodeExporter commented 9 years ago
Hi Mathieu,

Somehow a reply where I requested what operating systen, version and strace 
version you were using did not get attached to the issue.

Original comment by ele...@gmail.com on 16 Jan 2012 at 10:13

GoogleCodeExporter commented 9 years ago
Hi Lex,
My operating system is : 
Linux 3.1.9-2-ARCH #1 SMP PREEMPT Sat Jan 14 09:11:37 CET 2012 x86_64 Intel(R) 
Core(TM) i5-2520M CPU @ 2.50GHz GenuineIntel GNU/Linux

strace version is : 4.6

Best regards,
-Mathieu

Original comment by mathieu....@gmail.com on 20 Jan 2012 at 11:52

GoogleCodeExporter commented 9 years ago
Thanks Mathieu,

It apears to be a problem with strace, if you are happy for me to submit the 
two strace fragments to that project please provide them as an attachment so we 
know that the comment and mail system hasn't wrapped or otherwise modified 
them. (there are some strange artifacts that suggest they have been modified in 
transit).

Cheers
Lex

Original comment by ele...@gmail.com on 20 Jan 2012 at 11:34

GoogleCodeExporter commented 9 years ago
Mathieu,

As a postscript, if you can back your strace version to 4.5 and confirm the 
problem does not occur (I'm on 4.5.20 and havn't seen any problems) that is 
important information for the strace folks.

Cheers
Lex

Original comment by ele...@gmail.com on 21 Jan 2012 at 12:20

GoogleCodeExporter commented 9 years ago
I am starting work on fixing this issue. The suggested fix of just creating the 
process dict entry obviously works for mathieu, but it does lose some of the 
functionality of a proper clone_match in the StraceRunner. That being to 
transfer the cwd to the new process dict entry.

I also noticed that the execve_match basically does the suggested fix of just 
creating the process dict entry if it does not exist. This is also missing the 
transfer of the cwd to the new process dict entry. I am surprised this has not 
caused any issues. My assumption is that the original reason for this was to 
pick up the initial execve that kicks off the very first process traced, not to 
fix the interrupted fork.

I think the correct fix is to delay any matching until the clone has completed. 
I think this should be simple enough to do. Just create the StraceProcess 
object for the pid if it does not exist, but mark is as "not fully cloned". Any 
lines that match before the clone completes can be cached in the process entry. 
When the clone completes the lines can be re-processed and correct deps 
extracted. For execve there will have to be a special case to create a process 
dict entry if there are no processes being tracked. Does any one see any 
problems with this approach? 

Original comment by simon.al...@gmail.com on 3 May 2013 at 3:11

GoogleCodeExporter commented 9 years ago
Seeing this issue during django-manage makemessages (basically gettext).

Original comment by wer...@beroux.com on 7 May 2014 at 11:19

GoogleCodeExporter commented 9 years ago
Are you still working on this from time to time?

Original comment by wer...@beroux.com on 13 Aug 2014 at 9:49