koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.22k stars 116 forks source link

[DOCS] Example usage in docstring #596

Open FBruzzesi opened 8 months ago

FBruzzesi commented 8 months ago

As discussed in #586, although most of the library features are documented in the user guide, the best way to showcase how to access and use each class/function would be a minimal example usage in the docstrings. And this is currently covered only for a subset of the library features.

Here the list of all the remaining classes that would benefit from it:

As an instance of such minimal example you can refer to QuantileRegression docstring section, which renders as in its API section.

If possible try to add one unique example covering the relevant features and methods in the top level docstring of the class.

likeajumprope commented 3 months ago

I am going to work on some of those today (Johanna). The first one has definitively documentation

anopsy commented 3 months ago

I'm gonna do preprocessing.outlier_remover.OutlierRemover

anopsy commented 3 months ago

I'm going to work on DictMapper

anopsy commented 3 months ago

Hi hi! I'm going to add a usage example to sklego.meta.outlier_classifier.OutlierClassifier.

I have a question though, it requires both X, y in fit, otherwise it raises a ValueError. I understand that y is required here (classifier/metrics etc) but I kind of thought that when the fit signature says fit(X, y=None) y=None implies that y is optional. But in this case it's required. Just wondering

koaning commented 3 months ago

I have a question though, it requires both X, y in fit, otherwise it raises a ValueError. I kind of thought that when the fit signature says fit(X, y=None) y=None implies that y is optional. But in this case it's required. Just wondering

This might deserve a ticket of its own. My first reaction is that most outlier models don't require a y ... can't remember why the ValueError would be there.

anopsy commented 3 months ago

According to the docs, the intention was to morph this outlier model into a classifier, thus making the use of metrics possible.

anopsy commented 3 months ago

I'll work this week on usage examples for the following three classes preprocessing.pandastransformers.PandasTypeSelector preprocessing.projections.InformationFilter preprocessing.repeatingbasis.RepeatingBasisFunction

anopsy commented 3 months ago

I'll work on these: preprocessing.formulaictransformer.FormulaicTransformer preprocessing.identitytransformer.IdentityTransformer preprocessing.intervalencoder.IntervalEncoder

I also noticed that some examples start with "Examples" and some with "Example" and it has an influence on how the example is shown in documenattion I'll change "Examples" -> Example" so it gets the nice frame

FBruzzesi commented 3 months ago

Does anyone know if there is a way in a PR description to tag an issue and reference a particular task only? Or make some sort of partial fix reference? Otherwise it will automagically close the issue on merge.

I know it's possible to click on a task to create a separated issue, but maybe that's a bit too much

koaning commented 3 months ago

Good question. I guess not? We could see if there's an easy way to split this into many smaller tickets ... but that's also the best that I can come up with.

FBruzzesi commented 3 months ago

Hovering over a task will let you me create a separate issue (see screenshot), but I am not sure who has the access to that, I believe only the issue creator?!

image

koaning commented 3 months ago

Maybe other admins as well? I can also click it.

anopsy commented 2 months ago

Grabbing these: decomposition.pca_reconstruction.PCAOutlierDetection decomposition.umap_reconstruction.UMAPOutlierDetection

Also, what would be a more elegant example- applying them on ndrrays or a dataframe?

likeajumprope commented 2 months ago

Hi :) I am back :) I am really interested in the Bayesian ones - can I do aive_bayes.GaussianMixtureNB, naive_bayes.BayesianGaussianMixtureNB & neighbors.BayesianKernelDensityClassifier please?

likeajumprope commented 2 months ago

Also, maybe remove the deadzoneregressor one

anopsy commented 2 months ago

I'll take care of those:

model_selection.TimeGapSplit model_selection.GroupTimeSeriesSplit model_selection.KlusterFoldValidation

david26694 commented 1 week ago

Hello! since we're in the process of writing more and more docstrings, does it make sense to mktestdocs to check that our docstrings don't fail?

koaning commented 6 days ago

It is not the worst idea, but I do fear the sheer amount of compute that we might waste in doing that. The CI job will become a whole lot more expensive.