scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
58.49k stars 25.04k forks source link

DOC Investigate scipy-doctest for better doctests #29027

Open lesteve opened 2 weeks ago

lesteve commented 2 weeks ago

I learned about scipy-doctest recent release in the Scientific Python Discourse announcement. Apparently, scipy-doctest has been used internally in numpy and scipy for doctests for some time. In particular it allows floating point comparisons.

After a bit of work from us setting everything up, it would allow to have a few sprint / first good issues.

There is quite a few places where we used the doctest ellipsis, the quick and dirty following regexp finds 595 lines:

git grep -P '\d+\.\.\.' | wc -l

If you are not sure what I am talking about, this is the ... for doctest in rst for docstrings e.g. the last line of this snippet:

>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import cross_val_score
>>> X, y = datasets.load_iris(return_X_y=True)
>>> clf = svm.SVC(random_state=0)
>>> cross_val_score(clf, X, y, cv=5, scoring='recall_macro')
array([0.96..., 0.96..., 0.96..., 0.93..., 1.        ])

An example of a doctest with a spurious failure recently: https://github.com/scikit-learn/scikit-learn/pull/29140#issuecomment-2139904739

If you are wondering about the difference to pytest-doctestplus look at this. This does seem a bit unfortunate to have scipy/scipy_doctest and scientific-python/pytest-doctestplus but oh well (full disclosure I did not have time to look into the history) ...