jmschrei / apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
MIT License
499 stars 48 forks source link

Selection with pre-selected set #36

Open M-A-Hassan opened 2 years ago

M-A-Hassan commented 2 years ago

Hi,

Is there a way to pass a preselected set of data to the optimizer to be considered while calculating the gain? The preselected set is user-defined input to the optimizer and will not be altered or modified in any way by the optimizer.

Thanks for the effort and making this package available. Mohamed

jmschrei commented 2 years ago

Yes. Look at this argument in the functions: https://github.com/jmschrei/apricot/blob/master/apricot/functions/featureBased.py#L161

MoH-assan commented 1 year ago

Sorry for taking so long to reply. Now I have tried it, but it seems that this way only works if your initial set is a subset of the data. Please take a look at this: https://github.com/jmschrei/apricot/blob/bf86e699e6929127ccb5876d8c62c70785390eb0/apricot/functions/graphCut.py#L243 When a try to pass a 2 dim array, as my initial set, this line of code raises the following value error: ValueError: When using facility location, the initial subset must be a one dimensional array of indices.