ml6team / fondant

Production-ready data processing made easy and shareable
https://fondant.ai/en/stable/
Apache License 2.0
337 stars 27 forks source link

Facilitate TDD of components #570

Open PhilippeMoussalli opened 10 months ago

PhilippeMoussalli commented 10 months ago

TDD can make the development of component faster (no need to run within a pipeline) and more robust (unit test). We can have a command that generated a generic boilerplate of a test script based on the component spec:

mrchtr commented 10 months ago

I like this idea. I think it would be better not to have an extra command for generating test boilerplate code. Instead I would include the generation of the test code within the component boilerplate generation. By doing this the implementation of unit tests becomes somewhat mandatory.

Before we tackle this, it would probably be good to note down some guidelines for good test design. Components that apply non-ML related transformations to the dataframes are quite easy to test. Components which include the usage of ML models, loading, or writing data need more abstraction in the test design though.

In my opinion, TDD can significantly speed up development when the test design is not too complicated and the tests can be executed fast locally and in the cicd pipeline. I think, we should propose which parts of the component we are going to mock. In general, I would suggest mocking the output of machine learning models to ensure a fast test execution. Mocking of specific services adds some complexity which we can't cover in the boilerplate generation I guess.

We already had first discussions about general component test design some weeks ago here #301

RobbeSneyders commented 10 months ago

Since this is mostly focused on iterative development, we could also generate a notebook with example data. We could use some notebook magics so the user can actually develop their component code in the notebook and it gets automatically written to file.