probabl-ai / skore

Skore lets you "Own Your Data Science." It provides a user-friendly interface to track and visualize your modeling results, and perform evaluation of your machine learning models with scikit-learn.
https://probabl-ai.github.io/skore/
MIT License
51 stars 1 forks source link

doc: Improve the docstring from the code source so that the documentation API will be clear #514

Open sylvaincom opened 1 week ago

sylvaincom commented 1 week ago

Which part of the documentation needs improvement?

Which part of the documentation needs improvement? Documentation API (so enhance the code source docstring). Indeed:

The skore documentation will soon be launched (https://github.com/probabl-ai/skore/issues/412). Same as in scikit-learn, there will be an API tab. Actually, what is done in Sphinx documentation is that all the content of the documentation on our classes is automatically built from the docstring of the code source.

Example with scikit-learn's logistic regression

The documentation API looks like this:

Capture d’écran 2024-10-17 à 08 57 32

...

Capture d’écran 2024-10-17 à 08 56 21

while the corresponding docstring in the code base looks like this:

class LogisticRegression(LinearClassifierMixin, SparseCoefMixin, BaseEstimator):
    """
    Logistic Regression (aka logit, MaxEnt) classifier.

    In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
    scheme if the 'multi_class' option is set to 'ovr', and uses the
    cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
    (Currently the 'multinomial' option is supported only by the 'lbfgs',
    'sag', 'saga' and 'newton-cg' solvers.)

...

    Read more in the :ref:`User Guide <logistic_regression>`.

    Parameters
    ----------
    penalty : {'l1', 'l2', 'elasticnet', None}, default='l2'
        Specify the norm of the penalty:

        - `None`: no penalty is added;
        - `'l2'`: add a L2 penalty term and it is the default choice;
        - `'l1'`: add a L1 penalty term;
        - `'elasticnet'`: both L1 and L2 penalty terms are added.

Describe the problem found in the documentation

Currently, our docstring is quite poor:

Capture d’écran 2024-10-17 à 09 03 12

Especially, for machine learning stuff such as skore.cross_validate, it will have to be properly documented cc @augustebaum, @MarieS-WiMLDS.

Suggested improvement

Please write docstring everywhere! For each class, describe its attributes, etc. For each method of each class, document it.

Additional context

No response

augustebaum commented 1 week ago

For the documentation of the Project class, it needs to be mentioned that the class isn't supposed to be instantiated directly; rather, the user should first run python3 -m skore create project.skore and then in their script run project = skore.load("project.skore")

MarieS-WiMLDS commented 1 week ago

TODO: once the cross-validate PR is validated, check that it's compliant with the rest (from the PR, it looks ok to me).
To me, it will be all good.
@sylvaincom do you see any other part of the repo that needs better documentation?

sylvaincom commented 1 week ago

Hmm I think we're aiming at DS users that will probably ignore the front end and back end API to focus on ML API such as cross validation and train test split

But having a minimal clean docstring everywhere is nice to have I think

And very thorough docstring for pure ML stuff