probabl-ai / skore

Skore let's you "Own Your Data Science." It provides a user-friendly interface to track and visualize your modeling results, and perform evaluation of your machine learning models with scikit-learn.
https://probabl.ai
MIT License
11 stars 0 forks source link

In CI, check if all imported libraries are defined explicitly in our dependencies #405

Open thomass-dev opened 4 days ago

thomass-dev commented 4 days ago

We must ensure that installing skore is reproductible. Unittests are not sufficient because imported libraries can be shipped by one of our third-party libraries.

tuscland commented 1 day ago

We need clarification and a concrete example why this is needed. @thomass-dev

thomass-dev commented 1 day ago

Imagine you have the library x as only requirement of skore. Imagine you define a script using explicitly the library y.

In case where y is not a dependency of x, unit tests will failed. In case where y is a dependency of x, unit tests will succeed. In that last case, we are directly dependent of the way x declares y as dependencies. This is really bad, and can break (at the user runtime) as soon as x changes his dependencies.

I would like to add in the CI a way to check if all the libraries explicitly used in skore are well defined in the skore requirements.

tuscland commented 1 day ago

It's my understanding that one should always declare the dependencies for the libraries you use, and not rely on transitive dependencies to work that out for you.

I am sorry, I have difficulties to understand the problem. Why can't we just add the dependency for y?

thomass-dev commented 1 day ago

It's my understanding that one should always declare the dependencies for the libraries you use, and not rely on transitive dependencies to work that out for you.

That's the point. You should, but its not mandatory, as long as the package is available in the environment no matter how (transitive dependencies).

Why can't we just add the dependency for y?

We should, but sometimes we forget. Thats the point of this issue: to add a way to detect these oversights.

tuscland commented 1 day ago

You should, but its not mandatory, as long as the package is available in the environment no matter how (transitive dependencies).

I don't agree, if you use y, you must declare y as a dependency.

We should, but sometimes we forget. Thats the point of this issue: to add a way to detect these oversights.

The test fails and we fix it by adding the dependency? To me it sounds like adding tests to detect bugs we unintentionally wrote.

thomass-dev commented 1 day ago

The test fails and we fix it by adding the dependency?

Yes.

To me it sounds like adding tests to detect bugs we unintentionally wrote.

I don't understand, its the purpose of the tests in general, no?

tuscland commented 1 day ago

I don't understand, its the purpose of the tests in general, no?

Absolutely, when you need to enforce invariants. Unless I misunderstood, I have never written tests to verify the code I write uses an undeclared transitive dependency. Also, it seems the bug is not critical since it affects how tests perform, and the failure mode is easy to fix.

augustebaum commented 1 day ago

Check out https://github.com/tweag/FawltyDeps which addresses exactly this issue: it can check that all imported packages are in our declared dependencies.

tuscland commented 6 hours ago

As long as the benefit/maintainability balance is positive, this is nice to have.