kaiwaehner / kafka-streams-machine-learning-examples

This project contains examples which demonstrate how to deploy analytic models to mission-critical, scalable production environments leveraging Apache Kafka and its Streams API. Models are built with Python, H2O, TensorFlow, Keras, DeepLearning4 and other technologies.
Apache License 2.0
850 stars 305 forks source link

TopologyTestDriver based unit tests #11

Closed jukkakarvanen closed 5 years ago

jukkakarvanen commented 5 years ago

Current unit tests contains copy of actual implementation, not testing actual code in src folder.

There is example how to utilize TopologyTestDriver and actually testing actual implementation: https://github.com/jukkakarvanen/kafka-streams-machine-learning-examples/pull/1/files

This is not done as pull request because the implementation is done on top Open pull request: https://github.com/kaiwaehner/kafka-streams-machine-learning-examples/pull/10

This same changes could be moved on top of current branch without module split.

I can add similar also for other class where there are actual implementation. There are a couple of test where the actual implementation class is missing.

jukkakarvanen commented 5 years ago

Seems that the way of testing is inherited from kafka-streams-examples, so I created pull request also to there to get better test there: https://github.com/confluentinc/kafka-streams-examples/pull/219

kaiwaehner commented 5 years ago

This is great refactoring. I just merged the pull request. (I just moved the TestEmbeddedKafkaCluster and TestKafkaStreams into the same folder to keep it simple without the different packages)

I will do the same refactoring for the other examples.

I think one more good step is the following recommendation from "Kafka Streams in Action book":

Strive to keep business logic in standalone classes that are entirely independent of your Kafka Streams application. This makes them easy to unit test.

I will extract the "business logic" (in this case doing the model prediction for airline delays) into its own class. With this in place, it is even easier to add new business logic, models, and unit tests.