jaz303 / phake

A rake/make clone for PHP 5.3
365 stars 27 forks source link

Cyclic dependency checking #52

Closed denzel-morris closed 10 years ago

denzel-morris commented 10 years ago

Managing tasks in the environment of a distributed team can be tough, and sometimes cyclic dependencies between tasks happen by accident. This pull request implements Tarjan's algorithm, with tests, to find all cycles in a task graph. It also adds a command-line option that will output all cycles found along with the task names in a given cycle.

Hopefully this is as helpful for others as it is for us. Phake is a great addition to the PHP ecosystem, and we'd like to support it. Please, feel free to offer up any comments or concerns. Thanks!

jaz303 commented 10 years ago

Awesome! Really glad you've contributed this, thank you. I'll review properly tomorrow but can you confirm whether the cycle detection is always active, not only when the command line flag is passed? (I think it should always be active)

denzel-morris commented 10 years ago

No problem. That's a really good question, and one I had in fact. Currently, it does not actively search for cycles on every execution, it only happens when you pass in the CLI flag.

I chose not to make it automatic because I figured it'd slow down execution when there's a large number of tasks to check. However, I'm not opposed to making it automatically run. I'll update the pull request accordingly.

Since it will be always active, would you like for me to remove the command-line option? Seems it'd be rendered redundant.

On Jul 9, 2014, at 7:08 PM, Jason Frame notifications@github.com wrote:

Awesome! Really glad you've contributed this, thank you. I'll review properly tomorrow but can you confirm whether the cycle detection is always active, not only when the command line flag is passed? (I think it should always be active)

— Reply to this email directly or view it on GitHub.

jaz303 commented 10 years ago

Safe should be the default, so how about a command line option to disable the check for anyone who's experiencing performance issues?

denzel-morris commented 10 years ago

Makes sense, that's what I'll do. Expect an update tomorrow. Thanks for the feedback!

On Jul 9, 2014, at 8:03 PM, Jason Frame notifications@github.com wrote:

Safe should be the default, so how about a command line option to disable the check for anyone who's experiencing performance issues?

— Reply to this email directly or view it on GitHub.

denzel-morris commented 10 years ago

Hey Jason, updated the pull request: (1) cycle detection runs automatically on every invocation now, (2) a -u/--unsafe option has been added to disable cycle detection (and any other safe options in the future).

If cycles are detected, they are displayed, an exception is thrown, and the current invocation aborts.

clue commented 10 years ago

Thanks for this high quality PR @DenzelMorris, the code looks good to me :+1: And being a maintainer of a graph lib myself, I really enjoyed seeing Tarjan's algorithm here :)

That being said I wonder if it's not a big overkill in this project and if it actually makes sense to check and report cycles here. Personally, I'd rather see this project being able to handle cyclic dependencies. One possible option would be only invoking each task once and thus ignoring duplicate (cyclic) calls.

I very much appreciate your PR and hope this doesn't come over as criticizing. Just sharing my thoughts and honestly don't mind merging this in either way. Keep up the good work!

jaz303 commented 10 years ago

I can't think of a sensible reason for permitting circular dependencies; the very term "dependency" implies that one task must complete before the other, but when the graph contains cycles the task which runs first is undecidable.

denzel-morris commented 10 years ago

@clue: I enjoy the feedback and kind words, definitely doesn't come across as overly critical, so thank you. :)

I have to agree with @jaz303, at least in our use cases, there is no reason for us to permit circular dependencies due to the very nature of a directed graph of tasks to complete in order. However, I'm open to a discussion about other possible use cases that you may be thinking of.

clue commented 10 years ago

I have to agree with @jaz303 […]

I guess so do I :) I also do agree that checking for (in the sense of avoiding) cycles by default makes sense. At least as long as dependency in our context incurs order it means that no valid execution order can exist where A < B and B < A.

I'm not really asking to change this, but technically we could relax that constraint. In that case we would not need to avoid cycles at all.

With regards to the current situation I think it's safe to just avoid cycles (until we decide otherwise).

However, I think avoiding is more of a construction decision instead of a runtime one. Having the current implementation in mind, it's perfectly valid to change to dependency graph during a run, i.e. adding or removing new nodes and dependencies after checking the graph. In that case I'd suggest moving this check somewhere near the Node::add_dependency method.

I also enjoy the constructive feedback, keep it coming! :)

jaz303 commented 10 years ago

Sorry totally dropped the ball on this one. Just going to merge it as-is, any further refinements can happen in master.