lukeed / taskr

A fast, concurrency-focused task automation tool.
MIT License
2.53k stars 74 forks source link

Manage `files` & `globs` differently #229

Closed lukeed closed 7 years ago

lukeed commented 7 years ago

The values are stored & retrieved perfectly with single tasks and serial chains.

But with large parallel chain, the known values aren't 100% consistent. This is because ALL tasks & plugins are looking at & modifying the same internal variables: fly._.files, fly._.globs.

Ideas

Attach files and globs values to each fly.tasks entry.

lukeed commented 7 years ago

This is high priority & the last remaining implementation before an official 2.0 release. And since it may entail reworking how this._.files and this._.globs are stored/accessed, 2.0 can't be released until this is resolved.

The only way I've been able to do this so far is by cloning the current Fly context for each task. This enables each task to control its own files & globs values. But, this is horrible. As mentioned, the entire instance is cloned per task, which results in needlessly higher memory usage. Although the end-user does not experience a noticeable increase in task chain execution, it's not a direction I want to go.

devmondo commented 7 years ago

@lukeed i am not expert on this, but if user have tasks that run in parallel it is not practical to have those tasks modify the same files as the order is not guaranteed, so i think it is a user problem and not Fly's, in other words if i to have parallel tasks i must use different globs, otherwise it has to be serial.

lukeed commented 7 years ago

@devmondo Thanks for the input! But it seems I need to explain the issue better (esp after re-reading):

Every task is writing into the same variables... aka, overwriting the previous task's source values. So, let's assume we're running taskA and taskB in parallel. As soon as taskB initiates, it overwrites taskA's values for this._.files and this._.globs, which means that the plugin methods within taskA are now (or may be) working with taskB's source. 😟

Another example:

exports.scripts = function * () {
  yield this.source('js/**/*.js').babel().target('dist/js');
}

exports.styles = function * () {
  yield this.source('css/**/*.sass').sass().target('dist/css');
}

exports.default = function * () {
  yield this.parallel(['scripts', 'styles'])
}

When running default, there's no guarantee that scripts will be trying to run Babel on css/**/*.sass. Similarly, the styles task may actually be executing sass() on the js/**/*.js glob.

This does have a chance of working, though, but only if scripts finishes its entire execution before the .source() call within styles runs. But since Babel is a heavy API, there's practically nil chance of that occurring. Also, your chances of a successful parallel execution decreases exponentially (literally) as you include more tasks.

Does that make sense?

@jbucaran Feel free to chime in! I checked through 0.x and 1.x, and all of "parallel mode" was a semi-functional bug. Its sourcing was working (seemingly) by accident because parallel tasks weren't properly spawned and terminated. 🤔


@devmondo FWIW, I disagree. Anything should be able to run in parallel, regardless of source. As long as you don't have a task that depends on another's completion or alteration, then parallel away!

Imagine having a babel task and a lint task. One lints and one builds. You can chain them or their plugins together:

yield this.source('src/*.js').xo().babel().target('dist');

But you shouldn't be forced to do that, especially because it's not always going to be possible due to plugin/API methods and/or task needs.

In this example, running babel and lint is & should be perfectly viable to run in parallel.

jorgebucaran commented 7 years ago

Feel free to chime in! I checked through 0.x and 1.x, and all of "parallel mode" was a semi-functional bug. Its sourcing was working (seemingly) by accident because parallel tasks weren't properly spawned and terminated.

Haha what the...? I do remember having a hard time trying to figure parallel out back in the day.

I ~feel~ know this could be fixed if written from scratch.

devmondo commented 7 years ago

@lukeed wow very detailed explaination !

maybe i did not phrase my words correctly, what i meant is that if i have babel and minify requirements on the same file glob , logic says i should transpile first then minify and this can be done as you mentioned yield this.source('src/*.js').babel().minify().target('dist'); unless i still misuse fly :)

but when i read what you wrote here

When running default, there's no guarantee that scripts will be trying to run Babel on css/*/.sass. Similarly, the styles task may actually be executing sass() on the js/*/.js glob.

now i see what you mean, and it seems to be serious shortcoming and i think this is what you are trying to say.

i really dont grasp the internals sorcery of Fly and maybe i am wrong but from your description it seems that this._.files and this._.globs are kind of singleton or shared, if so then i think problem wont be solved as long as they re like this. how about separating things a little, like how about sharing what is not problematic and creating or attaching instances of problematic parts

stupid and maybe not related question but would it be good idea to run each task in separate v8 vm ?

sorry if my blabbering does not make sense :)

lukeed commented 7 years ago

@jbucaran:

My bad, was late. The spawn/terminate issue didn't affect parallelism --- I remembered incorrectly. However, we cloned the entire instance and passed it along to each task. I want to avoid this now as it's not kind of memory usage.

this could be fixed if written from scratch.

We have that opportunity right now. Open to ideas!

@devmondo:

what i meant is that if i have babel and minify requirements on the same file glob , logic says i should transpile first then minify

Of course! See:

As long as you don't have a task that depends on another's completion or alteration, then parallel away!

stupid and maybe not related question but would it be good idea to run each task in separate v8 vm ?

Not stupid & is related. That could be done, but it's more of an expensive operation than just cloning the current Fly instance (see above for link).

sorry if my blabbering does not make sense :)

It makes sense 😀


Ideas

  1. Clone the Fly instance & bind it to each task.

    PRO

    • Used in 0.x and 1.x
    • Works in current beta

    CON

    • n + 1 memory usage; original Fly + 1 per each task
  2. Use external state/value manager; eg, process.env.fly:

    init() {
     // ...
     process.env.fly = {tasks: {}}
    
     // bind all tasks to current Fly
     for (const k in tasks) {
       this.tasks[k] = tasks[k].bind(this);
     }
    }
    
    start(name, opts) {
     // ...
     process.env.fly.tasks[name] = {files: [], globs: [], prevs: []}
    }
    
    source(globs, opts) {
     // ???
     // somehow retrieve current task's name from context
    
     process.env.fly.tasks[name].files = [...]
     process.env.fly.tasks[name].globs = [...]
    }

    PRO

    • No memory issues
    • Universal Fly application (not specific behavior for parallel mode only)

    CON

    • cross-platform okay?
    • partial rewrite, including plugins' accessor to this._.files
devmondo commented 7 years ago

@lukeed thanks for nice words

option 2 seems the way to go , obviously more elegant! and it is somehow related to my humble suggestion :)

in regards to cross environment problem, is there any specific one that i can test for you on windows ? usually cross-env solve any cross env issues.

lukeed commented 7 years ago

@devmondo Thanks. I checked up on it and it looks like process.env goes way back, as I suspected / hoped.

Akkuma commented 7 years ago

So I read through some of the issue and one "farfetched" idea is to stop using this. It seems similar in nature to option 2. The idea would be for each task to be passed in the appropriate information and potentially return output to modify the input for the next plugin. Ultimately, this is just moving it to a functional style, so you just pass input and get output through the chain of plugins.

lukeed commented 7 years ago

@Akkuma Thanks, nice seeing you around again!

I'm on the same page. The problem is coming up with a set of changes that doesn't completely rewrite the API.

devmondo commented 7 years ago

@lukeed it is v2, i am willing to spend 1 hour to change all my fly tasks if we can get awesome powers :)

lukeed commented 7 years ago

@devmondo Haha, sorry? Not quite sure what you're saying.

devmondo commented 7 years ago

@lukeed i mean, this is a major version upgrade from v1, so we are expected to deal with some breaking changes in the API if it makes your life easier to fix this issue :)

rauchg commented 7 years ago

Some random suggestions:

First, maybe pass a fly instance to each task:

exports.scripts = function * (fly) {
  yield fly.source('js/**/*.js').babel().target('dist/js');
}

Not sure I like magic this.

Also, please consider the requirement of being able to easily programmatically execute fly tasks.

import fly from 'fly'
import tasks from './my-tasks'
// …
await fly(tasks).run('scripts')

I like the idea of externalizing state, but instead passing it by reference rather than process.env ?

lukeed commented 7 years ago

Thank you @rauchg for stopping by :)

As per the programmatic API: it's already doable in this beta version. It looks a bit different from your clean(er) syntax, but it's close enough. It also powers all our tests.

import Fly from 'fly'
import tasks from './my-tasks'
// ...
const fly = new Fly({tasks})
yield fly.start('scripts');

And passing in the context as a parameter still presents the same dilemma. All tasks will be sharing the same instance (now referenced as fly instead of this) unless I clone everything once per task. Given that Fly is lightweight, it's not the worst thing ever; but I'd rather the better solution, which is an external state.

I'm just trying to find the best way to handle push & keep track of those state changes.

lukeed commented 7 years ago

Added a failing test to help illustrate & track the issue: c88de388c3755b5726b28dd5f08791bee9f1fb74

hzlmn commented 7 years ago

Hi @lukeed, just a quick thoughts...

The one solution (but bad), that won't break current api is to implement priority queue for accessing internal this._.files from tasks that executes in parallel.

Another one that comes to mind, is to move files state managment from fly to task context, basically we should have separate structure for it. I can not find a situation where i want to share file context between separate tasks, it's kinda hack for me.

Also plus from me for @rauchg proposition to use explicit fly instance inside config (It won't fix current problem, just a better? design)

What do you think @jbucaran ?

lukeed commented 7 years ago

Hey @hzlmn, thanks :)

I've been thinking that there needs to be a Task class. This falls inline with your second point.

The Task will need manage its own source & hold a reference to the parent Fly context so that it can access the prototype methods & share them with other Tasks.

However, this still runs into the same problem I'm having now: Fly.prototype methods will not lose the Fly context. This happens as a side effect of allowing our methods to chain. So, even with a Task class, calling this.source and this.target will still always be modifying the main Fly context.

I worry that the only solution to this problem is to assign & reboot / rebind all Fly.prototype methods to Task.prototype. This will have to happen for every Task on every execution. This undoubtedly will be memory-heavy, even if the initialized Tasks are stored within Fly.

jorgebucaran commented 7 years ago

@hzlmn Hmm rauchg's suggestion of passing a fly instance into the generator is how a future API may/should look like IMO. Nowadays, I refrain from using this, but back in the day, I thought hacking on this was cool and was also motivated to do it like so from koajs this magic.

lukeed commented 7 years ago

@jbucaran @hzlmn I agree, but it doesn't resolve the issue. It's still writing into the same context every time.