Improving the developer experience - what do we need?

PGijsbers commented 1 year ago

I think it is time we invest into the developer experience before adding more features. To open the issue I am sharing my thoughts on where we should put our effort, but I welcome feedback and suggestions! I do not propose to halt all effort in other directions, but I think contributors that want to focus on the longevity of the benchmark should ideally focus their efforts here first. My hope is that by addressing these issues, future features and frameworks will be easier to add and quicker to review.

In my opinion, this project is missing several aspects that I would expect from a modern Python project:

[ ] Type hints
[ ] pre-commit for auto-formatting and linting (e.g., ruff, black)
[ ] Good unit test coverage. Fast isolated tests for things like data ingestion, configuration parsing, and result processing. Due to the stochasticity and time consuming process of installing framework tools, full integration tests are probably left for CI by default.
[ ] Clear contribution guidelines
[ ] Step-by-step tutorials for different ways to extend the benchmark (meaning benchmark, constraint, framework)

With that in place (or at least the first three), I think the following refactorings should be highest priority:

[ ] Refactor the use of Namespaces everywhere. To me it makes it much harder to see what information functions expect from their signatures, and at times code has to be carefully analyzed (or checked at runtime) to determine the exact content of arguments. I propose to replace (most) Namespaces by dataclasses or pydantic models, whenever dynamic assignment isn't necessary. Might need to use something like Pydantic's TypedDict to support merging out of the box.
[ ] Break down large functions and modules that contain multiple classes and responsibilities (e.g., amlb/runners/aws.py or amlb/datasets/file.py into multiple submodules to make the code easier to navigate.
[ ] Prefer immutable to mutable.
[ ] https://github.com/openml/automlbenchmark/issues/279 which also allows things like recording intermediate results. E.g., train->save artifacts->test->inference time, would more reliably provide results while still attempting to measure inference time.

PGijsbers commented 1 year ago

In the same line, we could reduce developer overhead by externalizing the framework integrations: https://github.com/openml/automlbenchmark/discussions/571

This hopefully has the benefit that framework authors claim more ownership of their integration and ensure it stays updated themselves.

mfeurer commented 11 months ago

I fully agree on everything you write and the plan you laid out here! This will make AMLB easier to understand, use, and contribute to.

Innixma commented 8 months ago

Huge +1 to type hints, especially in code that is executed in a subprocess and thus is hard to debug (such as in the exec.py scripts).

openml / automlbenchmark

Improving the developer experience - what do we need? #566