metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
71 stars 34 forks source link

Modules blocking on `closeStream()` may not forward their data in scripts with wormholes #108

Open cboehme opened 11 years ago

cboehme commented 11 years ago

The behaviour you observed is a bug indeed, @ah641054:

Using wormholes users can combine data from multiple flows:

Flow A -> @wormhole;
Flow B -> @wormhole;
@wormhole -> Flow C;

Flux first executes flow A and then flow B. Flow C is executed indirectly by receiving data from A and B. Once execution of A and B is finished, closeStream() is called on both flows. After having called closeStream() on flow A, all modules in A and C are closed. Calling closeStream() on flow B then only closes the remaining modules in B. Modules in C ignore the second call.

This behaviour is all well until a module in flow B attempts to send data prior to forwarding the closeStream() event (as count-triples does, for instance). This data will be send to the already closed modules in flow C.

Possible solutions for this problem are:

cboehme commented 11 years ago

We requires significant framework changes. For the time being a quick fix should be implemented (in version 1.1)

mgeipel commented 11 years ago

Provided a quick fix in 49f24df by adding wait-for-inputs(N) to Flux.

example

data1 | open-file | as-lines | @Y;
data2 | open-file | as-lines | @Y;

@Y|
wait-for-inputs("2")|
write(out);

See also https://github.com/culturegraph/metafacture-core/tree/master/examples/beacon/create