scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.98k stars 25.38k forks source link

DOC fix discrepancies between docstrings and signatures #2062

Closed jnothman closed 9 years ago

jnothman commented 11 years ago

The parameters listed in class, method and function docstrings sometimes are obsolete, misspelled, or missing from the actual call signature. Often the docstring parameter should be removed, but in other cases like #2060 an argument should be added to the function.

Here are some possible discrepancies (including a number of false positives; false negatives include all Cython code) produced by this script:


sklearn.cluster.dbscan_.DBSCAN.fit in doc ['params']
sklearn.cluster.k_means_.KMeans.__init__ in argspec ['copy_x', 'verbose']
sklearn.cluster.k_means_.MiniBatchKMeans.__init__ in argspec ['verbose']
sklearn.cluster.k_means_._kmeans_single in argspec ['n_clusters', 'precompute_distances']
sklearn.cluster.k_means_._labels_inertia in argspec ['precompute_distances']
sklearn.cluster.k_means_._mini_batch_step in argspec ['compute_squared_diff', 'old_center_buffer']
sklearn.cluster.k_means_.k_means in argspec ['precompute_distances']
sklearn.cluster.mean_shift_.MeanShift.__init__ in argspec ['bin_seeding']
sklearn.cluster.mean_shift_.mean_shift in argspec ['cluster_all', 'max_iterations']
sklearn.cluster.mean_shift_.mean_shift in doc ['min_bin_freq']
sklearn.covariance.graph_lasso_.GraphLasso.__init__ in doc ['cov_init']
sklearn.covariance.robust_covariance.c_step in argspec ['cov_computation_method']
sklearn.covariance.robust_covariance.fast_mcd in argspec ['cov_computation_method']
sklearn.covariance.shrunk_covariance_.ShrunkCovariance.__init__ in argspec ['assume_centered']
sklearn.covariance.shrunk_covariance_.ShrunkCovariance.fit in doc ['assume_centered']
sklearn.cross_validation.permutation_test_score in argspec ['n_permutations']
sklearn.datasets.base.load_files in argspec ['charse_error']
sklearn.datasets.base.load_files in doc ['charset_error']
sklearn.datasets.samples_generator.make_sparse_spd_matrix in argspec ['largest_coef', 'norm_diag', 'smallest_coef']
sklearn.datasets.svmlight_format.load_svmlight_file in argspec ['dtype']
sklearn.datasets.svmlight_format.load_svmlight_files in argspec ['dtype']
sklearn.decomposition.dict_learning.MiniBatchDictionaryLearning.partial_fit in argspec ['iter_offset']
sklearn.decomposition.fastica_.fastica in doc ['source_only']
sklearn.decomposition.nmf._nls_subproblem in argspec ['V', 'W']
sklearn.decomposition.nmf._nls_subproblem in doc ['V, W']
sklearn.decomposition.pca._assess_dimension_ in argspec ['n_features']
sklearn.decomposition.pca._assess_dimension_ in doc ['dim']
sklearn.ensemble.forest.ExtraTreesClassifier.__init__ in argspec ['compute_importances']
sklearn.ensemble.forest.ExtraTreesRegressor.__init__ in argspec ['compute_importances']
sklearn.ensemble.forest.RandomForestClassifier.__init__ in argspec ['compute_importances']
sklearn.ensemble.forest.RandomForestRegressor.__init__ in argspec ['compute_importances']
sklearn.ensemble.gradient_boosting.BinomialDeviance.update_terminal_regions in argspec ['k', 'learning_rate', 'sample_mask']
sklearn.ensemble.gradient_boosting.HuberLossFunction.update_terminal_regions in argspec ['k', 'learning_rate', 'sample_mask']
sklearn.ensemble.gradient_boosting.LeastAbsoluteError.update_terminal_regions in argspec ['k', 'learning_rate', 'sample_mask']
sklearn.ensemble.gradient_boosting.LossFunction.update_terminal_regions in argspec ['k', 'learning_rate', 'sample_mask']
sklearn.ensemble.gradient_boosting.MultinomialDeviance.update_terminal_regions in argspec ['k', 'learning_rate', 'sample_mask']
sklearn.ensemble.gradient_boosting.QuantileLossFunction.update_terminal_regions in argspec ['k', 'learning_rate', 'sample_mask']
sklearn.ensemble.gradient_boosting.RegressionLossFunction.update_terminal_regions in argspec ['k', 'learning_rate', 'sample_mask']
sklearn.feature_extraction.text.TfidfTransformer.transform in argspec ['copy']
sklearn.feature_extraction.text.TfidfVectorizer.transform in argspec ['copy']
sklearn.feature_selection.rfe.RFE.__init__ in argspec ['verbose']
~~sklearn.feature_selection.rfe.RFECV.__init__ in doc ['loss_function']~~ (#2071)
sklearn.feature_selection.univariate_selection.f_oneway in doc ['sample1, sample2, ...']
sklearn.gaussian_process.correlation_models.absolute_exponential in argspec ['d']
sklearn.gaussian_process.correlation_models.absolute_exponential in doc ['dx']
sklearn.gaussian_process.correlation_models.cubic in argspec ['d']
sklearn.gaussian_process.correlation_models.cubic in doc ['dx']
sklearn.gaussian_process.correlation_models.generalized_exponential in argspec ['d']
sklearn.gaussian_process.correlation_models.generalized_exponential in doc ['dx']
sklearn.gaussian_process.correlation_models.linear in argspec ['d']
sklearn.gaussian_process.correlation_models.linear in doc ['dx']
sklearn.gaussian_process.correlation_models.pure_nugget in argspec ['d']
sklearn.gaussian_process.correlation_models.pure_nugget in doc ['dx']
sklearn.gaussian_process.correlation_models.squared_exponential in argspec ['d']
sklearn.gaussian_process.correlation_models.squared_exponential in doc ['dx']
sklearn.hmm.GMMHMM.predict in argspec ['algorithm']
sklearn.hmm.GaussianHMM.__init__ in argspec ['algorithm', 'covariance_type', 'covars_prior', 'covars_weight', 'init_params', 'means_weight', 'n_iter', 'params', 'thresh']
sklearn.hmm.GaussianHMM.__init__ in doc ['_covariance_type']
sklearn.hmm.GaussianHMM.predict in argspec ['algorithm']
sklearn.hmm.MultinomialHMM.predict in argspec ['algorithm']
sklearn.hmm._BaseHMM.predict in argspec ['algorithm']
sklearn.lda.LDA.fit in argspec ['tol']
sklearn.linear_model.base.LinearRegression.__init__ in argspec ['copy_X']
sklearn.linear_model.bayes.ARDRegression.__init__ in doc ['X', 'y']
sklearn.linear_model.bayes.BayesianRidge.__init__ in doc ['X', 'y']
sklearn.linear_model.coordinate_descent.ElasticNetCV.__init__ in argspec ['copy_X', 'fit_intercept', 'normalize']
sklearn.linear_model.coordinate_descent.LassoCV.__init__ in argspec ['copy_X', 'fit_intercept', 'normalize']
sklearn.linear_model.least_angle.LassoLarsIC.fit in argspec ['X', 'copy_X']
sklearn.linear_model.least_angle.LassoLarsIC.fit in doc ['x']
sklearn.linear_model.passive_aggressive.PassiveAggressiveClassifier.fit in doc ['sample_weight']
sklearn.linear_model.randomized_l1.RandomizedLasso.__init__ in argspec ['n_resampling', 'selection_threshold']
sklearn.linear_model.randomized_l1.RandomizedLogisticRegression.__init__ in argspec ['n_resampling', 'selection_threshold']
sklearn.linear_model.ridge.ridge_regression in argspec ['alpha']
sklearn.manifold.spectral_embedding.spectral_embedding in argspec ['norm_laplacian']
sklearn.metrics.metrics.f1_score in argspec ['y_pred']
sklearn.metrics.metrics.fbeta_score in argspec ['y_pred']
sklearn.metrics.metrics.precision_recall_fscore_support in argspec ['y_pred']
sklearn.metrics.metrics.precision_score in argspec ['y_pred']
sklearn.metrics.metrics.recall_score in argspec ['y_pred']
sklearn.metrics.pairwise.polynomial_kernel in argspec ['coef0']
sklearn.metrics.pairwise.sigmoid_kernel in argspec ['coef0']
sklearn.metrics.pairwise.sigmoid_kernel in doc ['degree']
sklearn.mixture.dpgmm.DPGMM.__init__ in argspec ['verbose']
sklearn.mixture.dpgmm.VBGMM.__init__ in argspec ['init_params', 'n_iter', 'params', 'thresh', 'verbose']
sklearn.mixture.gmm.sample_gaussian in argspec ['covar']
sklearn.mixture.gmm.sample_gaussian in doc ['covars']
sklearn.multiclass.fit_ecoc in argspec ['X', 'n_jobs', 'y']
sklearn.pls.CCA.__init__ in doc ['X', 'Y']
sklearn.pls.PLSCanonical.__init__ in doc ['X', 'Y']
sklearn.pls.PLSRegression.__init__ in doc ['X', 'Y']
sklearn.pls.PLSSVD.__init__ in argspec ['copy']
sklearn.pls.PLSSVD.__init__ in doc ['X', 'Y']
sklearn.pls._PLS.__init__ in doc ['X', 'Y']
sklearn.preprocessing.KernelCenterer.transform in argspec ['copy']
sklearn.qda.QDA.fit in argspec ['tol']
sklearn.semi_supervised.label_propagation.BaseLabelPropagation.__init__ in argspec ['n_neighbors']
sklearn.tree.export.export_graphviz in argspec ['close']
sklearn.tree.tree.DecisionTreeClassifier.__init__ in argspec ['compute_importances']
sklearn.tree.tree.DecisionTreeRegressor.__init__ in argspec ['compute_importances']
sklearn.utils.arpack._eigs in argspec ['return_eigenvectors', 'tol', 'which']
sklearn.utils.arpack._eigsh in argspec ['mode', 'return_eigenvectors', 'tol', 'which']
sklearn.utils.extmath.pinvh in doc ['cond, rcond']
sklearn.utils.extmath.svd_flip in argspec ['u', 'v']
sklearn.utils.extmath.svd_flip in doc ['u, v']
sklearn.utils.multiclass.unique_labels in doc ['lists_of_labels']
sklearn.utils.safe_sqr in argspec ['copy']
sklearn.utils.testing.all_estimators in argspec ['include_other']
sklearn.utils.testing.all_estimators in doc ['include_others']
sklearn.utils.testing.fake_mldata in doc ['Note', "transposes 'data', keep that into account in the tests."]
jaquesgrobler commented 11 years ago

Thanks for this list. I'll have a look into this soon, if nobody else tackles it first

kanielc commented 11 years ago

There are quite a bit of these to go through. I started with metrics and a couple in bayes.py. I'll do more later when I have time.

I had problems with the script using the newest code (iter_modules gone from "testing" for example). Also found another problem where six.py was referring to winregs (which doesn't work on Linux).

kanielc commented 11 years ago

Another question, should I just wait until I've done all I'm going to do before sending a pull request, or send for each additional set I do?

larsmans commented 11 years ago

Feel free to send PRs with partial fixes.

samuelstjean commented 11 years ago

In sklearn,decomposition.MiniBatchSparsePCA, the residuals are never returned in the dictionnaryonline function. The documentations wrongly states that it has a method called error, giving the error at each iteration, but it does not exist.

See https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/dict_learning.py line 643 and https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/sparse_pca.py line 210

agramfort commented 11 years ago

indeed

option 1: fix the docstring (easy) option 2: fix the code (harder)

I can do the easy one :)

samuelstjean commented 11 years ago

Well, is the residual on the minibatch easily computable as the residuals for the fullbatch version? I though adding the option to return the residuals, then computing them for each batches and averaging the result at each iterations would be a way to do it.

Of course it's much easier to clean the doc, but adding more functionnality is probably better for the long run.

agramfort commented 11 years ago

Well, is the residual on the minibatch easily computable as the residuals for the fullbatch version? I though adding the option to return the residuals, then computing them for each batches and averaging the result at each iterations would be a way to do it.

Of course it's much easier to clean the doc, but adding more functionnality is probably better for the long run.

indeed. PR welcome :)

raghavrv commented 9 years ago

I am working on this... Seems like there are still 80 120 odd such discrepancies to be fixed...

amueller commented 9 years ago

Should we close this as #4023 got merged?

raghavrv commented 9 years ago

I think we've not fixed cython level discrepancies....

jnothman commented 9 years ago

I'm happy with this being closed. It would be nice to be able to configure landscape to check for any new errors of this type introduced by a PR, but I'm not sure if that configurability is available.

On 30 March 2015 at 08:07, ragv notifications@github.com wrote:

I think we've not fixed cython level discrepancies....

— Reply to this email directly or view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/2062#issuecomment-87477532 .

amueller commented 9 years ago

having an automated check seems out of reach for the moment. I think we can close and come back to it in the future.