Closed veldsla closed 10 years ago
This is awesome, and something that people have requested for quite a while. I think the change to complete
is fine, as long as it's backward compatible. I'll take a look -- would love to get this feature merged.
A couple of things: execution isn't actually parallel in JS, since we only have the one thread of execution. A better word would be 'concurrent.' (This is relevant for the complete
/this.complete
, since only one task will ever be executing at one time.) It might be possible to flag the currently executing task, so that the global complete still works the same way. Instead of assuming it refers to the end of the execution chain, it would refer to the currently executing task.
I'm aware JS is single threaded. As mentioned we use a lot of jake.exec calls which are asynchronous so many of the could be triggered simultaneously. Because it is unknown when the exec will finish the global complete can never finish the correct task. We encountered this during development when we used a series of file tasks and jake bailed out because the expected file wasn't (yet) written when complete was called. Or am I missing something obvious here?
Regarding the naming concurrent is probably a better choice. The concurrent tasks have to be asynchronous and preferably wait a lot in the asynchronous operations in order to gain from this feature. Like shelling out with jake.exec.
I think the runparallel option should be parallelLimit, since that's what this actually is.
Does this properly handle dependencies that depend on each other?
task A
task B depends on A
task C depends on [A, B] parallelLimit 2
Jake should know that B depends on A and not run it twice.
I think it currently is possible to trick Jake into running a task twice, maybe if A runs longer than B it would be started again when we arrive at the B prereqs. I would like to do additional testcases, but I first wanted to get some feedback and see if there would any interest to merge such a feature.
Oh, so the runner could enter another concurrent task before finishing the previous one. I should have thought of that. Given that, yes, the complete
API would need to pass the task instead of the data value.
@ben-ng's point about prerequisites is a valid one. Seems like this would require a change from a simple "done" property like we have now to a "completeness" or "state" or similar that references an enumeration of UNSTARTED/RUNNING/DONE.
Even then, you would not be able to start B until A was done, which would mean there would be certain tasks which could not be run concurrently, even if listed as concurrent prereqs. Basically this changes a chain into a tree-structure where the nodes are visited bottom-up? I do think this would work even if there is a node which lives in multiple places in the tree. It might make circular prereqs easier to trip over, but it's a problem we have with the chain as well.
@veldsla: i think we can solve the dependency dependency problem by checking each dependency in the list and removing any subdependencies.
In my example above, C's dependency list would be reduced to B because A is a dependency of B.
This would be simpler than messing with states.
This would be an awesome feature, btw (:
@mde I also thought about the started/finished state variable. I'm not sure @ben-ng's graph approach would work for us. We use Jake in a different way. Basically we have about three to four module tasks which internally generate several file_tasks, use those tasks as prereqs for a new task and invoke it. The module then waits for complete and moves on to the next module. So initially the task tree would be easy A->B->C
, but A would generate A'->[a1,a2,a3,a4,....]
and run it before completing and moving on to B.
This probably makes little sense :grinning:
Ah, so you'd need to reduce all the concurrent prereqs to the end of whatever chain of sub-prereqs each might have -- effectively creating that tree, where each sibling level is a group of potentially concurrently running tasks.
Given: task('a', {async: true}); task('b' ['a'], {async: true}); task('d' ['b', 'e', 'f', 'g'], {async: true, prereqsConcurrent: true}); And: task('c', ['a', 'b', 'd'], {async: true, prereqsConcurrent: true});
Task c's final prereq list would be only 'd.' You could run d's prereqs in any order, but 'a' would have to run as a prereq of 'b.'
@veldsla, are you using the Task API to add prereqs on the fly? Or are you just calling task
and declaring new tasks?
@mde That would be number two. I create several file('out', ['in'], {async:true} function() { jake.exec(...); })
and then do a task('new', ['out', 'out', ...], {async:true, parallel: 4})
. I then access that task via the global jake variable, invoke it and wait for completion. This works fine, but if there's a better way I'm just getting started!
Nope, that's totally legit. So you're really dealing with only one level of prereqs. That's fine, but whatever we implement needs to take deeper hierarchies into account.
Goodmorning,
I created @ben-ng's testcase (https://gist.github.com/veldsla/634c84a223db02f1b55a#file-jakeduppre) and indeed A was run twice. I quickly introduced a started flag in the task which circumvents this problem 3d4b8034008b4ea5a00f14c5a1a9864ed521ecf7
Nice work!
Hm, rather than just bandaiding it by adding another Boolean flag, this really should be a three-state variable.
Sure no problem. Shall I have a go at cleaning it up? Maybe add a few tests? Would you merge it or do you have plans to implement this differently? Also I only use filetasks, tasks and rules. Do you expect this change to affect any of the other convenience tasks?
Definitely needs some pretty extensive tests. I don't have plans to implement it another way, so we can certainly merge this branch once there's reasonable tests on it. All the convenience tasks (also Rules) are actually just built on top of Task (doing something very similar to what you're doing by dynamically creating them), and they won't be using the concurrent functionality.
Slightly delayed, but I cleaned it up and added a few tests. I renamed runparallel to concurrentLimit, but I'm open to suggestions. The three state variable was implemented using strings because I was unsure where to declare a constant.
The tests cover two simple tasks running two dependencies concurrently and depending on these two tasks sequentially and concurrently. It uses the setTimeout to finish out of order and the order is asserted in the tests. Also I tried throwing an error and added @ben-ng double dependency.
Any remarks? Suggestions for other tests?
I think parallelLimit is more idiomatic -- async is one of the most depended upon modules for doing this, and that's what people are going to try first.
Very nice work solving the duplicate dependency problem!
I was hoping to avoid the word 'parallel' since it's technically incorrect, but if the functionality is precisely the same as what people are used to in Async, then I agree it wouldn't be bad to bow to expedience.
I added a couple of comments. Thanks so much for your work on this!
I think it isn't possible to create the status object on TaskBase and still refer to in in File(Base), so I defined it on Task instead. Also added a small section to the docs.
Perfect. Thanks so much for this. This will ship in Jake v8, which will be the next release, happening soon.
We use jake to run and chain long running shell jobs and require many parallel processes to optimally use the server. We use functions that generate many jake file tasks to use as prerequisites and it would be great if those tasks run in parallel
This is an attempt to run the dependencies of a task in parallel. It uses caolan/async to run and limit the amount of simultaneous tasks. We needed to mess a bit with the invocationChain to accommodate this. It really requires an api change regarding the global complete function although it is currently backwards compatible as long as you don't use runparallel.
Would you be interested in a feature like this? Do you have any suggestions for improvement? We created an example jakefile here https://gist.github.com/veldsla/7d08c36ef30a504c91e0#file-jakepartest