sighjs / sigh

multi-process expressive build system for the web and node.js, built using baconjs observables
209 stars 12 forks source link

pipeline plugin does not waits for all pipelines completion #32

Closed Strate closed 8 years ago

Strate commented 8 years ago
module.exports = function(pipelines) {
    pipelines['build:styl'] = [
        glob("src/**/*.styl"),
        stylus({
            use: [require.resolve("nib")],
            import: [
                path.resolve('src/lib/components/theme/base.styl'),
            ]
        }),
        postcssPooled({
            pluginsFactory: require.resolve("./build/postcssPluginsFactory")
        }),
        cssmodules(),
    ]

    pipelines['build:ts'] = [
        glob("src/**/{*.ts,*.tsx}"),
        typescript(),
        babel6({babelrc: path.resolve("./.babelrc")}),
        replace({
            search: {
                pattern: "\\.styl\"",
                flags: "g"
            },
            replace: ".css\""
        }),
    ]

    pipelines['build:static'] = [
        glob("src/**/{*.yml,*.gif,*.jpeg,*.jpg,*.png,*.svg,*.woff,*.woff2,*.ttf,*.eot}"),
    ]

    pipelines['compile:dev'] = [
        pipeline({ activate: true }, 'build:styl', 'build:ts', 'build:static'),
        write("target")
    ]
}

when I run sigh, target folder populated by compile:dev pipeline after each requested pipeline finished, not after all finished:

% sigh -v -j 8                                                                        
 * running pipelines [ build:styl, build:ts, build:static, compile:dev ] with 8 jobs
 * waiting for subprocesses to start
 * subprocesses started in 4.507 seconds
 * pipeline build:static complete: 0.057 seconds
 * pipeline compile:dev complete: 0.169 seconds
 * pipeline build:styl complete: 7.201 seconds
 * pipeline compile:dev complete: 7.459 seconds
 * pipeline build:ts complete: 16.937 seconds
 * pipeline compile:dev complete: 17.002 seconds
 * pipeline(s) complete: 21.824 seconds

I have tried to rewrite compile:dev pipeline like that:

pipelines['compile:dev'] = [
        merge(
            pipeline({ activate: true }, 'build:styl'),
            pipeline({ activate: true }, 'build:ts'),
            pipeline({ activate: true }, 'build:static')
        ),
        write("target")
    ]

and there is no difference. How to force pipeline plugin to wait all nested pipelines to complete? Or maybe I do something wrong?

insidewhy commented 8 years ago

It isn't supposed to wait... the pipeline is reactive and sends updates as they happen. Maybe you need the debounce plugin if you want to collect events. Isn't it better if your events get written as they happen instead of the build system forcing things to wait?

Strate commented 8 years ago

It isn't supposed to wait

Okay, how to do that?

I need to wait for completion of all build stages and then pass all events (which are compiled files) to webpack (I am working on plugin now). Webpack need all compiled assets to bundle them.

Strate commented 8 years ago

And, if it not supposed to wait, how mocha pipeline from example works?

pipelines['tests-run'] = [
    pipeline('build-source', 'build-tests'),
    debounce(500),
    mocha({ files: 'lib/**/*.spec.js' })
  ]

So, it does not wait for build-source and build-tests stages completion. What happens if tests already built, and sources no, mocha runs on half-built system?

insidewhy commented 8 years ago

Check out the documentation for debounce

insidewhy commented 8 years ago

As for waiting for completion, then consider how the -w mode might work? Sigh is reactive, I do not think webpack's API is.

Strate commented 8 years ago

Documentation of debounce says:

Combines events in the pipeline until the event stream settles for longer than the given period.

Imaginate situation, connected to test-run pipeline. What if there is a huge file, which compilation took more than 1 second. It means that there would be no events whithin some time, and debounce consider it delay as signal to invoke mocha. But this is wrong, isn't it? mocha should run after, and only after whole compilation completes, not after delay since last event.

If it is not possible, seems that sigh could not be used for complex projects :-(

Strate commented 8 years ago

And, this is VERY unclear from documentation, really. I think that you could describe it explicitly.

insidewhy commented 8 years ago

It is a functional reactive build system, it can be used in very complex scenarios and supports incremental rebuilds at a level impossible in almost any other system. But you have to change your way of thinking. You aren't considering the reactive context at all if you think in terms of streams "ending". When you use -w streams never end. I think it's wise to avoid making incendiary remarks about a domain you are not yet familiar with.

Strate commented 8 years ago

Okay, let forget about remark about complex projects, keep discuss cool.

How can I ensure, that all my files are compiled and stream contains all files compiled? Something like that?

[
  glob('src/**/*.*'), // convert all files to events
  typescript(), // compile all ts files
  stylus(), // compile all stylus files
  babel6(), // compile all js and jsx files
  webpack() // Here I expect a stream, fullfilled with events which are compiled files. Webpack plugin converts 100 assets to 1 js file
  write("target") // asset file, produced with webpack comes to target folder
]

Am I miss something?

insidewhy commented 8 years ago

It requires quite a lot of words to explain, I'll come back to this in a few days when I have more time.

Strate commented 8 years ago

NP, I'll continue play around with it, thank you for your participation!

insidewhy commented 8 years ago

Definitely check out some articles about Functional Reactive Programming first, sigh is designed along these principles. Also articles about Incremental Rebuilds.

insidewhy commented 8 years ago

This one is pretty good: https://gist.github.com/staltz/868e7e9bc2a7b8c1f754

Strate commented 8 years ago

On initial state glob populate stream with one event, which is array of file events. And when this event passing through consumers, every consumer accepts this one event with all files. This works perfectly with one flat pipeline:

pipelines.buildts = [
  glob('src/**/*.ts'), // emits one stream event, which is array of "ts" file events
  typescript() // accepts one stream event, which is array of "ts" file events, emits one stream event, which is array of compiled "js" files
  write('target') // accepts one stresm event, which is array of file events, emitted from previous consumer "typescript"
]

But on distributed pipelines it is not so clear:

pipelines.buildts = [
  glob('src/**/*.ts'), // emits one stream event, which is array of "ts" file events
  typescript() // accepts one stream event, which is array of "ts" file events, emits one stream event, which is array of compiled "js" files
]
pipelines.buildcss = [
  glob('src/**/*.ts'), // emits one stream event, which is array of "css" file events
  postcss(), // accepts one stream event, which is array of "css" file events, emits one stream event, which is array of compiled "css" files
]

pipelines buildall = [
  pipeline("buildts", "buildcss") // this will emit *two* stream events, each one event emitted from previous pipelines
  write("target") // accepts *two* stream events, each one is compiled "ts" or "css" files
]

And while for write consumer there is no problem to handle two events, beacuse it can handle them separately, other consumers (concat, mocha, webpack, browserify and others) needs to get all files from previous pipelines together, not separately. One of solution is debounce consumer, it performs concatenation of stream events to one stream event (performing array.concat operation). But debounce emit event after some delay between events. And this could be a problem on initial phase: if there is a big delay between events (for example, one of previous pipeline is complex and completes much later than previous), debounce will emit event without files from that pipelines. And this gonna to be a problem for consumers listed above.

I think, that pipeline(1, 2, ..., n) (or merge(1, 2, ..., n)) consumer should waits for n events on initial stage and concatenate them.

Hope that I do not missed something.

insidewhy commented 8 years ago

That change won't be made.

insidewhy commented 8 years ago

First try to understand how -w mode works then we can talk about this

Strate commented 8 years ago

-w mode emits one stream event per each file change, which is array of one file event, but does it matters? there is no problems with watch mode, at least because any event from -w mode comes after initial phase of build. And any consumer can store files from initial phase to memory (i.e. to memory-fs), update files in that structure and performs compilation from memory on each event from watch mode.

insidewhy commented 8 years ago

Cool, that's almost correct. I guess that's why, instead of modifying pipeline/merge, I think a new plugin that combines events and forwards them all together with storing them in something like memory fs would be better. Your suggestion would break the -w case.

insidewhy commented 8 years ago

Although given it works off of streams it should instead just group events (taking remove events into account) and then add another plugin to deal with APIs that need an fs. APIs that are fs-only are annoying ;)

Strate commented 8 years ago

Your suggestion would break the -w case.

But why? I suggest to do wait/concatenation only for initial phase. It could be tested pretty simple. If there is watch mode just forward any event without awaiting anything.

Strate commented 8 years ago

And there is not about storing memory at all. It about to ensure, that all initial phase build completed.

Strate commented 8 years ago

https://github.com/Strate/sigh-await-and-merge

insidewhy commented 8 years ago

Using the promise like that to concatenate the first set of events is nasty, should be using Bacon for that. The plugin should also coalesce events to support watch mode updates. Their is code to support this in sigh-core.

Strate commented 8 years ago

There is a big fat TODO on coalesceEvents method https://github.com/sighjs/sigh-core/blob/master/src/stream.js#L98

insidewhy commented 8 years ago

toFilesystemState is the one. It keeps a catalogue of all events as they relate to the filesystem and passes them all along after receing each update.

insidewhy commented 8 years ago

coalesceEvents would be for filtering out unnecessary duplicates which you don't need in your context.

Strate commented 8 years ago

Okay, it could be easy added as the next consumer for stream. What about your remark about Promises. Could you give me some points to Bacon method which should be used? I am really not so close with it.

Strate commented 8 years ago

I have play around with it, together with a watch mode, seems that is works good. And, from my point of view, this behaviour expected to be default behaviour of merge/pipeline builtins.

Strate commented 8 years ago

Adding toFileSystemState was a bad idea, because this turns one file update event to N file events, which are always bypasses to filesystem.

insidewhy commented 8 years ago

Am confused, you want to pass "all events" down the stream in the first payload, but then subsequent payloads only contain the updates? I can't quite figure what would benefit from this hybrid/inconsistent approach to the pipeline.

Strate commented 8 years ago

I can't quite figure what would benefit from this hybrid/inconsistent approach to the pipeline.

Sad to read this, I have tried to describe it several times with different examples in this thread. I'm gonna to try it again.

Imagine that you have a quite big project. It contains a lot of files, you want to transpile them in parallel and then, only after all files transpiled, run webpack/browserify on them. Definetely that tools should run on all transpiled files, not on only the part, hope that it is really clear.

Lets try to implement the pipeline:

pipelines.build = [
  merge(
    [glob("src/**/*.ts"), typescript(), babel6()],
    [glob("src/**/*.styl"), stylus(), postcss()]
  ),
  write('target"),
  webpack("target/src/index.js")
]

At first glance, this should works like this:

  1. Build typescript files and stylus files in parallel,
  2. As both of them ready, put result to filesystem,
  3. Run webpack with entry point "target/src/index.js"

In a fact (and this is really not clear from docs/examples), it works like this:

  1. Build typescript files, put result to file system, run webpack on "target/src/index.js"
  2. In parallel with task 1 do: Build stylus files, put result to file system, run webpack on "target/src/index.js".

This is definetely not the thing that I want. And this is definetely wrong: at first time webpack runs on half-populated target directory (without stylus files) and it will definetely fail.

You can tell me that debounce is the answer, and I will tell you that not. Because debounce aggregate events during some delay in milliseconds, and this is easy gonna to race condition:

  1. If build time difference between two build stages (typescript and stylus) is greater than configured delay, webpack will run on half-populated directory again.
  2. If build of both stages completes faster than configured delay, pipeline will wait for rest delay, which is not needed any more.

awaitAndMerge plugin solves that problem in a right way: it does not operate with delay, it operates with count of child pipelines, waits for them and merge their results together, and webpack will run on fully populated target directory, without any race conditions or extra delays:

pipelines.build = [
  awaitAndMerge(
    [glob("src/**/*.ts"), typescript(), babel6()],
    [glob("src/**/*.styl"), stylus(), postcss()]
  ),
  write('target"),
  webpack("target/src/index.js")
]

Of course on watch mode it should not buffer and wait for anything, because watch event can come only from one child pipeline. But this could not produce any error, because any watch events comes after, and only after initial build. And initial build always populates target directory with full bunch of files, so updates can come one-by-one.

And this behaviour was expected by me by default for builtin merge/pipeline

insidewhy commented 8 years ago

I get that, instead of using fs based APIs it's better to use the API exported at the library layer. Depending on the target API you can either do this incrementally or not. Your plugin only works for fs-based APIs (maybe webpack doesn't export a programmatic API?). So to handle such APIs you're better of not using write at all... That's the plugin you want to replace, ideally with one that uses an in-memory fs.

insidewhy commented 8 years ago

So instead of awaitAndMerge(...), write(targetDir), webpack() ideally you'd want something like collectInMemoryFs(), webpack().

collectInMemory fs would story the files/directories in say, /dev/shm, and change the paths of the events to reference these memory files. Then the webpack() API can work far more efficently.

However of course it's better if you don't have to use an fs based API at all. Then you could use something like your previous example... and then you'd definitely want to emit the total filesystem state because it would make dealing with the target API much easier (unless the target API is capable of incrementally building, then you wouldn't need the await stage at all).

You seem to be getting pretty emotional about this, I think it's better to try to adjust your mindset to a new way of solving problems (in an FRP kinda way) than getting annoyed about why you can't push FRP into your idea of how things have traditionally worked.

Strate commented 8 years ago

How collectInMemoryFs prevents running webpack (or browserify, or mocha) twice (or more) on initial build phase?

insidewhy commented 8 years ago

collectInMemoryFs({ coalesceInitialEvents: 3 }), or via two plugins: collesceInitialEvents(3), collectInMemoryFs().

Strate commented 8 years ago

awaitAndMerge do right this thing: coalescing initial events, but instead of accepting count of that events, it accepts child pipelines and waits for pipelines.length events. This behaviour is expected to be default for builtin merge.

What about your notes for not using fs-based api. This is not about fs-base or not fs-based. This is about to run task after whole initial build finished. This could be really any kind of task, event not connected with fs at all.

Strate commented 8 years ago

btw, I have made a little refactoring for await-and-merge https://github.com/Strate/sigh-await-and-merge/blob/master/src/index.js, now it correctly await for all initial phase completed, collected initial phase events are merged with probably raised non-initial events and all of them transformed via toFileSystemState

insidewhy commented 8 years ago

This behaviour is expected to be default for builtin merge.

As you've repeated many many times. However it really isn't a good idea, it would break many awesome incrementally rebuilding pipelines that have been built with sigh. You say you want it only because you're considering working with non-incrementally-rebuilding APIs like webpack. To keep on repeating you think the way it is designed is a mistake and it should be your way only shows you haven't considered many many possibilities.

Strate commented 8 years ago

it would break many awesome incrementally rebuilding pipelines that have been built with sigh

How exactly? This is only about initial build, rebuilding is not touched at all, isn't it?

insidewhy commented 8 years ago

As I've said many times, there is no concept of "initial build" in many pipelines. It's FRP man... FRP!!!

insidewhy commented 8 years ago

I will implement "await and merge" mode as an option to merge. Check out https://github.com/sighjs/sigh/issues/46. You can check out the code for how to implement the algorithm using FRP and without using the promise like that.