mattcasters / kettle-beam

Kettle plugins for Apache Beam
Apache License 2.0
42 stars 11 forks source link

Neo4j Output (With collect map) not flushing rows #52

Open rhaces opened 4 years ago

rhaces commented 4 years ago

Using kettle-beam with Kafka Beam Consumer + Neo4j does not cause the last rows to be flushed when using Neo4j Cypher step and it is configured to "Collect parameter values map". If kafka does not received new rows, when the kettle-beam job finishes consuming all available messages, the number of output rows to Neo4j is always less than the total number of messages in kafka.

rhaces commented 4 years ago

This is with latest 1.0.2

mattcasters commented 4 years ago

At first glance this is an issue for the Neo4j Cypher step https://github.com/knowbi/knowbi-pentaho-pdi-neo4j-output/issues/159 So I fixed it there in the 5.0.0 version.

HOWEVER, after further stress testing I think there can be a race condition so I implemented a waiting loop in the @FinishBundle method in StepTransform. For at most 30 seconds we'll try to write rows. If that fails we'll throw an error so we don't silently lose any data.

mattcasters commented 4 years ago

Various changes to the Neo4j 5.0-beta4 plugin / Cypher step. This also needs to be re-tested with 1.0.3. The problem was confirmed and then solved.