incedo / fabricate

Automatically exported from code.google.com/p/fabricate
0 stars 0 forks source link

Script hangs if group==after #51

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Specifying group='foo', after='foo' will never return (unless process is 
killed)

What is the expected output? What do you see instead?
Should wait for other run(group='foo') but not self.
It is useful when for example updating a critical section like a database with 
steps that should be done after other DB operations (like migration, 
reindexing, testing).

What version of the product are you using? On what operating system?
fabricate v1.26 on Ubuntu x64 10.04

Example source code:

from fabricate import run, main

def build():
    run('echo', 'aa', group='db')
    run('echo', 'bb', group='db', after='db')

main(parallel_ok=True, jobs=3) 

Original issue reported on code.google.com by wer...@beroux.com on 12 Sep 2013 at 11:12

GoogleCodeExporter commented 9 years ago
What you are doing does not make sense, you are saying that 'echo bb' should 
not run until all commands in the group db have completed.  But the group 
cannot complete until the 'echo db' command is complete since it is part of the 
group, so the command never runs.

Put the 'echo bb' command in a different group, or just don't group it.

Grouping commands tell fabricate that those commands can run in parallel.  It 
is not for indicating logical groupings.

Original comment by ele...@gmail.com on 12 Sep 2013 at 11:30

GoogleCodeExporter commented 9 years ago
Yes I noticed that's the way it works. The usage I made was not a
dependency graph buy a critical section. Meaning I don't expect it to wait
a command group='bb' is run but only to wait if there is one running in
that group, until it finishes.

Have a name group= is adding to the confusion as it implies more than one
item can be part of it.

Another part adding to that confusion, is that wait() will not wait if
nothing is run(), but wait= will until that group (or first of that group?)
is run().

I'd suggest to check if group=wait.

To it more clear and support more I'd also suggest to make wait= work like
a critical section (don't run while something of the group is running) and
rename the existing wait= to dependOn=.

Last, the Ctrl+C is not really working in multithreaded mode. The issue is
that it loses time in many occasions, which is the opposite of the goal of
multithreading.

Comment #1 on issue 51 by ele...@gmail.com: After group issue
http://code.google.com/p/fabricate/issues/detail?id=51

What you are doing does not make sense, you are saying that 'echo bb'
should not run until all commands in the group db have completed.  But the
group cannot complete until the 'echo db' command is complete since it is
part of the group, so the command never runs.

Put the 'echo bb' command in a different group, or just don't group it.

Grouping commands tell fabricate that those commands can run in parallel.
It is not for indicating logical groupings.

Original comment by wer...@beroux.com on 13 Sep 2013 at 7:26

GoogleCodeExporter commented 9 years ago
I don't think this is a bug. I think it is a use issue. Just create 2 groups. 
e.g.

def build():
    run('echo', 'aa', group='dbInit')
    run('echo', 'bb', group='dbAddData', after='dbInit')

I am not sure what this wait you are talking about is. Do you mean after? I can 
add a check for group==after. I will be throwing an exception if that is true, 
because is is a miss-use of the fabricate API. 

The Ctrl+C problem in multithreaded mode is an issue. So I will create a new 
bug for that. 

I will leave this bug open, so I can add a check for after==group, and thrown 
an exception, rather than the script hanging for ever.

Original comment by simon.al...@gmail.com on 13 Sep 2013 at 12:26

GoogleCodeExporter commented 9 years ago
@Simon, I don't think its worth checking for group == after, deadlock can occur 
on any cycle in the ordering dependency graph, not just a one-step cycle.

Original comment by ele...@gmail.com on 13 Sep 2013 at 12:35

GoogleCodeExporter commented 9 years ago
Seems google code doesn't apply email replys to comments to the issue, so 
re-posted here.

Yes I noticed that's the way it works. The usage I made was not a
dependency graph buy a critical section. Meaning I don't expect it to wait
a command group='bb' is run but only to wait if there is one running in
that group, until it finishes.

The 'group' option specifies that this run command is part of a group of 
commands.  The 'after' option specifies an ordering dependency, that this 
command must not run until after *all* the commands in the named group have 
completed.  That is how those options are specified to work.

Have a name group= is adding to the confusion as it implies more than one
item can be part of it.

Indeed, any number of commands can be part of a group. A group is a name for 
that set of commands so they can be referred to.  Running after the group 
depends on them *all* completing, therefore no command can specify it needs to 
run after a group it is part of, nor can any other cycle exist in the ordering 
graph or there will be a deadlock as you experienced.

Another part adding to that confusion, is that wait() will not wait if
nothing is run(), but wait= will until that group (or first of that group?)
is run().

There is no wait() function or 'wait' option in the fabricate API?  Not sure 
what you mean.

I'd suggest to check if group=wait.

To it more clear and support more I'd also suggest to make wait= work like
a critical section (don't run while something of the group is running) and
rename the existing wait= to dependOn=.

Last, the Ctrl+C is not really working in multithreaded mode. The issue is
that it loses time in many occasions, which is the opposite of the goal of
multithreading.

Not sure what you mean here either.  The fabricate parallel processing is not 
multi-threaded, it is multi-process, it runs multiple commands in separate 
processes.  Aborting the main process with Ctrl+C may not cause child processes 
to abort unless they are part of the same process group.  Is that the issue?

Original comment by ele...@gmail.com on 13 Sep 2013 at 12:43

GoogleCodeExporter commented 9 years ago
Sorry 'wait' is 'after'.

Ok I understand how it works now. Still missing a bit the possibility of a  
'critical section'. Like I'd not expect an index database "upgrade" and 
"reindex" to work well simultaneously. However I wouldn't want to block all 
other commands in the system for those. Not the most important feature though. 
I could in most cases with minimal issues have it ordered.

It's however important to detect cases were group=after. Also detect other 
cases of infinite waits.
Let's note run(..., group='A', after='B') as A -> B, then some cases to detect:

Case 1: Self-reference
A->A

Case 2: Reference non-existing
B->A
(none with group=A)

Case 3: Circular references
C->B
B->A
A->C

Is it possible to detect Ctrl + C? Yes it's an issue. In that dead-lock case I 
had to kill all python processes. In general it often happens that I want to 
stop a build in the middle which is not possible else.

Original comment by wer...@beroux.com on 14 Sep 2013 at 3:51

GoogleCodeExporter commented 9 years ago
You get the effect of a critical section by making all the commands that you 
want to run before the critical command, members of a group, say 
"pre-critical", then make the critical command its own group="critical" and 
after="pre-critical" and then all commands that need to run after the critical 
one after="critical".

Detecting cycles is an expensive operation and would need to be run each time 
an after is defined.  So it will slow down everybody even if they don't need 
it.  The parallel operation in fabricate is to allow obvious parallelism to be 
exploited, eg a set of compiles prior to a link.  Its not intended to be used 
to create complex orderings to gain absolute maximum parallelism.

Yes, you have to be careful in the case of declaring an "after" on a group that 
does not exist yet, but it is useful in some circumstances to simplify the 
order that the commands are specified.  So simply banning it is not appropriate.

The problem with ctrl-c is that even if we caught the KeyboardInterrupt 
exception all we could do is to terminate() the pool of worker processes, we 
have no idea what children they have running the commands, so we can't 
explicitly kill them, we have to rely on the OS to do so, but it does not do so 
in various circumstances.

I think that terminate() is already done automatically during shutdown so I 
wouldn't expect it to make any difference, but you could try adding:

if _pool is not None: _pool.terminate()

before line 1582 and see if it helps.

Original comment by ele...@gmail.com on 15 Sep 2013 at 1:44