Open FBruzzesi opened 1 year ago
I am going to work on some of those today (Johanna). The first one has definitively documentation
I'm gonna do preprocessing.outlier_remover.OutlierRemover
I'm going to work on DictMapper
Hi hi! I'm going to add a usage example to sklego.meta.outlier_classifier.OutlierClassifier.
I have a question though, it requires both X, y in fit, otherwise it raises a ValueError. I understand that y is required here (classifier/metrics etc) but I kind of thought that when the fit signature says fit(X, y=None) y=None implies that y is optional. But in this case it's required. Just wondering
I have a question though, it requires both X, y in fit, otherwise it raises a ValueError. I kind of thought that when the fit signature says fit(X, y=None) y=None implies that y is optional. But in this case it's required. Just wondering
This might deserve a ticket of its own. My first reaction is that most outlier models don't require a y
... can't remember why the ValueError
would be there.
According to the docs, the intention was to morph this outlier model into a classifier, thus making the use of metrics possible.
I'll work this week on usage examples for the following three classes preprocessing.pandastransformers.PandasTypeSelector preprocessing.projections.InformationFilter preprocessing.repeatingbasis.RepeatingBasisFunction
I'll work on these: preprocessing.formulaictransformer.FormulaicTransformer preprocessing.identitytransformer.IdentityTransformer preprocessing.intervalencoder.IntervalEncoder
I also noticed that some examples start with "Examples" and some with "Example" and it has an influence on how the example is shown in documenattion I'll change "Examples" -> Example" so it gets the nice frame
Does anyone know if there is a way in a PR description to tag an issue and reference a particular task only? Or make some sort of partial fix reference? Otherwise it will automagically close the issue on merge.
I know it's possible to click on a task to create a separated issue, but maybe that's a bit too much
Good question. I guess not? We could see if there's an easy way to split this into many smaller tickets ... but that's also the best that I can come up with.
Hovering over a task will let you me create a separate issue (see screenshot), but I am not sure who has the access to that, I believe only the issue creator?!
Maybe other admins as well? I can also click it.
Grabbing these: decomposition.pca_reconstruction.PCAOutlierDetection decomposition.umap_reconstruction.UMAPOutlierDetection
Also, what would be a more elegant example- applying them on ndrrays or a dataframe?
Hi :) I am back :) I am really interested in the Bayesian ones - can I do aive_bayes.GaussianMixtureNB, naive_bayes.BayesianGaussianMixtureNB & neighbors.BayesianKernelDensityClassifier please?
Also, maybe remove the deadzoneregressor one
I'll take care of those:
model_selection.TimeGapSplit model_selection.GroupTimeSeriesSplit model_selection.KlusterFoldValidation
Hello! since we're in the process of writing more and more docstrings, does it make sense to mktestdocs to check that our docstrings don't fail?
It is not the worst idea, but I do fear the sheer amount of compute that we might waste in doing that. The CI job will become a whole lot more expensive.
As discussed in #586, although most of the library features are documented in the user guide, the best way to showcase how to access and use each class/function would be a minimal example usage in the docstrings. And this is currently covered only for a subset of the library features.
Here the list of all the remaining classes that would benefit from it:
preprocessing.outlier_remover.OutlierRemover
(solved in #639 )meta.outlier_classifier.OutlierClassifier
(solved in #646)preprocessing.dictmapper.DictMapper
(solved in #646)preprocessing.pandastransformers.PandasTypeSelector
(solved in #648)preprocessing.projections.InformationFilter
(solved in #648)preprocessing.repeatingbasis.RepeatingBasisFunction
(solved in #648)preprocessing.formulaictransformer.FormulaicTransformer
(solved in #648)preprocessing.identitytransformer.IdentityTransformer
(solved in #648)linear_model.ProbWeightRegression
(solved in #691)linear_model.DeadZoneRegressor
(solved in #691)linear_model.DemographicParityClassifier
(solved in #691)linear_model.EqualOpportunityClassifier
(solved in #691)model_selection.TimeGapSplit
model_selection.GroupTimeSeriesSplit
model_selection.KlusterFoldValidation
naive_bayes.GaussianMixtureNB
naive_bayes.BayesianGaussianMixtureNB
neighbors.BayesianKernelDensityClassifier
meta.confusion_balancer.ConfusionBalancer
meta.estimator_transformer.EstimatorTransformer
meta.grouped_predictor.GroupedPredictor
meta.grouped_transformer.GroupedTransformer
meta.regression_outlier_detector.RegressionOutlierDetector
meta.subjective_classifier.SubjectiveClassifier
meta.thresholder.Thresholder
mixture.bayesian_gmm_classifier.BayesianGMMClassifier
mixture.bayesian_gmm_detector.BayesianGMMOutlierDetector
mixture.gmm_classifier.GMMClassifier
mixture.gmm_detector.GMMOutlierDetector
preprocessing.intervalencoder.IntervalEncoder
As an instance of such minimal example you can refer to
QuantileRegression
docstring section, which renders as in its API section.If possible try to add one unique example covering the relevant features and methods in the top level docstring of the class.