Closed buma closed 12 years ago
Please feel free to send pull requests on the specific parts of the documentation that are incomplete.
Usually the doc string specify the input either as sparse matrix, ndarray
or just ndarray
.
Thanks I see now it seems I didn't noticed before that sparse support is already specified.
Well, it is a bit hidden. If you have an idea where to document it so that it is more obvious to new users, any suggestions are welcome.
Which was the estimator that only supported predict_proba
in the binary case?
Maybe we can add somewhere how sparse support is documented? Like for each estimator, the docstring says ...
as I said above. But then this has to be a place that new users definitely read.
In the introduction maybe? In the API section? (pretty sure they don't read that).
An idea would be a page all classifiers that support sparse.
It was Perceptron and SGDClassifier.
And I just found something weird aboud SGDClassifier http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html :
This implementation works with data represented as dense numpy arrays of floating point values for the features.
And in fit method it is written that it supports sparse arrays. I used it and arrays weren't dense, because dense arrays were to big for my memory.
2012/9/3 Andreas Mueller notifications@github.com
Well, it is a bit hidden. If you have an idea where to document it so that it is more obvious to new users, any suggestions are welcome.
Which was the estimator that only supported predict_proba in the binary case?
— Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/issues/1107#issuecomment-8242515.
An idea would be a page all classifiers that support sparse.
The danger is that it would quickly fall out of sync.
It was Perceptron and SGDClassifier.
And I just found something weird aboud SGDClassifier http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html :
This implementation works with data represented as dense numpy arrays of floating point values for the features.
And in fit method it is written that it supports sparse arrays. I used it and arrays weren't dense, because dense arrays were to big for my memory.
Well, maybe you found a place where it fell out of sync :(.
I would suggest generating that part of the documentation, using a sphinx extension similar to the automatic tests set up by Andreas.
Hm ok, I remember the thing about Perceptron and SGDClassifier.
If you could provide a pull request that adds this to the docstring of predict_proba
, that would be very helpful.
I think that would be the right place to have this comment.
About the page for sparse support: There is this PR here for an estimator overview, but I think it is still a bit controversial.
@GaelVaroquaux haha I think auto generating might actually be feasible. It will not be possible to have something like "does only support predict_proba in binary case" but it could do a good overview.
The SGDClassifier docstring discrepancies probably stem from unupdated docstring when we did the merge from the dense and sparse codebases into a single code base.
Sorry about the blank message previously, too quick on the "Send".
On 3 September 2012 09:40, Andreas Mueller notifications@github.com wrote:
Hm ok, I remember the thing about Perceptron and SGDClassifier.
Just wanted to point out that there is an existing discussion about predict_proba in the multiclass case:
http://comments.gmane.org/gmane.comp.python.scikit-learn/3562 http://comments.gmane.org/gmane.comp.python.scikit-learn/3381
and a WIP (that seems to have stalled a couple of months ago):
https://github.com/scikit-learn/scikit-learn/pull/849
Cheers, Fred.
Yeah, the WIP by Peter is actually quite important to me... but this is another issue ;)
Interesting idea about classifier page. It should probably be generated.
First try Pull request. I didn't find the code for the Perceptron, Will the documentation be updated for Perceptron also, becuse class is derived from SGDClassifier?
Sphinx version: 1.1.3 Python version: 2.7.3 Docutils version: 0.9.1 release Jinja2 version: 2.6
I tried to see documentation with make html, but I get an error from Sphinx.
writing output... [ 31%] datasets/index
Exception occurred:
File "/usr/lib/python2.7/site-packages/docutils/writers/html4css1/init.py", line 1026, in visit_image
and self.settings.file_insertion_enabled):
AttributeError: Values instance has no attribute 'file_insertion_enabled'
Yes, the predict_proba
function is the same. The code for the perceptron is in linear_model/perceptron.py
.
I don't know about the sphinx error.
Btw, you can usually just make
. make html
builds the pictures and is pretty slow.
@buma Thanks for the correction in the docs. The classifier summary is already an open issue, so I guess we can close this one, ok?
Thanks for merging. Yes I thing it should be closed.
I used scikit and It bothered me that in documentation wasn't written that predictive_probabilites are created only in boolean classification problems on some classifiers.
It also bothered me that it is not specified everywhere if classifier supports sparse input?
Can I add this or is someone already working on it?