jmschrei / apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
MIT License
499 stars 48 forks source link

Not scalable #7

Closed saurabh11baghel closed 4 years ago

saurabh11baghel commented 4 years ago

@jmschrei @domoritz I want to select a subset of 100 samples from a dataset of total 100000 samples and 25 features. The FeatureBasedSelection method is taking forever without doing anything at all.

data_subset,labels_subset = FeatureBasedSelection(100,verbose=True).fit_transform(data,labels)

it is showing verbose output as following for the past one hour. 0%| | 0/100 [00:00<?, ?it/s]

What do you thing is wrong?

jmschrei commented 4 years ago

Sorry for missing this. I don't really know what is wrong. Can you try downloading the latest patch and trying it again? Also try using optimizer='stochastic', which should be significantly faster but not the exact greedy solution.

jmschrei commented 4 years ago

Please re-open if you are still encountering issues.