ntucllab / libact

Pool-based active learning in Python
http://libact.readthedocs.org/
BSD 2-Clause "Simplified" License
777 stars 175 forks source link

Change Dataset interface to support sparse matrix. #165

Closed eugene-yang closed 5 years ago

eugene-yang commented 5 years ago

Changed the libact.base.dataset.Dataset interface to support sparse matrix as X. The interfaces for get_entries, get_labeled_entries and get_unlabled_entries are changed. Since most of the usage of these methods are getting the list of tuple and zip them back to the a feature matrix and list of labels, directly change the interface to output in this format would benefit both using and storing the data.

This would also directly support scipy.sparse.csr_matrix since the zipping during the initialization is removed. The interface of Dataset.data[] is still implemented via __getitem__ magic method to support some of the use case that involve direct access to the entries.

codecov-io commented 5 years ago

Codecov Report

Merging #165 into master will decrease coverage by <.01%. The diff coverage is 98.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
- Coverage   89.46%   89.46%   -0.01%     
==========================================
  Files          37       37              
  Lines        1557     1566       +9     
==========================================
+ Hits         1393     1401       +8     
- Misses        164      165       +1
Impacted Files Coverage Δ
...ery_strategies/multiclass/hierarchical_sampling.py 95.85% <100%> (ø) :arrow_up:
...query_strategies/multilabel/binary_minimization.py 100% <100%> (ø) :arrow_up:
...ct/query_strategies/active_learning_by_learning.py 85.71% <100%> (ø) :arrow_up:
...ltilabel/cost_sensitive_reference_pair_encoding.py 92.1% <100%> (ø) :arrow_up:
libact/query_strategies/variance_reduction.py 68.88% <100%> (-1.33%) :arrow_down:
libact/labelers/ideal_labeler.py 100% <100%> (ø) :arrow_up:
libact/query_strategies/hintsvm.py 93.02% <100%> (ø) :arrow_up:
...es/multilabel/multilabel_with_auxiliary_learner.py 89.58% <100%> (ø) :arrow_up:
libact/models/multilabel/dummy_clf.py 94.11% <100%> (ø) :arrow_up:
.../multiclass/active_learning_with_cost_embedding.py 100% <100%> (ø) :arrow_up:
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d86b7b8...776ee7b. Read the comment docs.

yangarbiter commented 5 years ago

The changes looks good to my. Thanks for the contribution. Just fix the coding style (mainly about whitespaces) and I think it's be ready to merge.

eugene-yang commented 5 years ago

@yangarbiter Do you want me to fix them? Like the whitespaces around the brackets?

yangarbiter commented 5 years ago

Yes, please fix them For example lb = lbr.label( trn_ds.data[ask_id][0] ) should be lb = lbr.label(trn_ds.data[ask_id][0]) I didn't mark all of them. But try to comply with google's style guide (https://google.github.io/styleguide/pyguide.html#36-whitespace) Thanks.

yangarbiter commented 5 years ago

Last two questions and I'll merge. Thanks for the hard work @eugene-yang .