scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
60.11k stars 25.4k forks source link

Add a multi-label classification task #5403

Closed raghavrv closed 6 years ago

raghavrv commented 9 years ago

Add a dataset with multi-label data.

Refer #5105

Preferably the emotions dataset

ghost commented 9 years ago

I would like to solve this issue. Cause I have just achieved a multi-label classification using Matlab, I can try to collect data from the emotions dataset and test it using sklearn.

raghavrv commented 9 years ago

@wujw13 Please go ahead!!

ghost commented 9 years ago

@rvraghav93 I think the emotions dataset may not be used. In its End User License Agreement, it suggests that "The user may not distribute the dataset or portions thereof in any way". Is there second choice of a dataset that fits requirements?

raghavrv commented 9 years ago

Wait lets ping @arjoly or @amueller for suggestions :)

arjoly commented 9 years ago

An another good choice is the "scene" dataset. I hope it is not "too" big in term of memory footprint.

ghost commented 9 years ago

All right. I will check this dataset.

amueller commented 9 years ago

yeast is smaller, right?

arjoly commented 9 years ago

yes it's a bit smaller. yeast can be easily obtain using mldata (through fetch_mldata).

kshitij10496 commented 8 years ago

I would like to work on this issue if no one is working on it

kshitij10496 commented 8 years ago

Ping @rvraghav93 @arjoly @amueller I just want to make sure whether we are going to add the yeast dataset or something else ?

raghavrv commented 8 years ago

@kshitij10496 Yes please go ahead and add the yeast dataset. Refer #5325

kshitij10496 commented 8 years ago

@rvraghav93 Thanks I am working on it

gxyd commented 6 years ago

@amueller as you said in your comment, is that still valid?

Will the first step towards completing PR https://github.com/scikit-learn/scikit-learn/pull/5960 be to address your comment?

gxyd commented 6 years ago

Also I don't see no 'yeast' dataset included with scikit-learn.

jnothman commented 6 years ago

Don't we have rcv1 as a real multilabel dataset with a fetcher? Yes, I think this can be closed.