theochem / Selector

Python library of algorithms for selecting diverse subsets of data for machine-learning.
https://selector.qcdevs.org
GNU General Public License v3.0
22 stars 22 forks source link

[methods.partition GridPartitioning] #134

Open FarnazH opened 1 year ago

FarnazH commented 1 year ago

To do:

FanwangM commented 1 year ago

The current GridPartition method is missing tests (commented in the corresponding tests). The problem is the usage of compute_diversity in 89f1d5d is outdated where the argument names does not work for hypersphere_overlap_of_subset in compute_diversity function. Once #138 is merged, hope we can solve this problem automatically. But if not, I will provide a quick fix myself.

FarnazH commented 1 year ago

Post PR-#162: Any comments are welcomed:

FarnazH commented 1 year ago

@Ali-Tehrani, when putting together the quick_start.ipynb (see PR #186), I encountered the RuntimeWarning: invalid value encountered in floor_divide bin_index = np.floor_divide(X - axis_minimum, bin_length) from L124 of selector/methods/partition.py. The bin_length ends up being zero. I have copied the code snippet that reproduces this below. I didn't have time to look into it, can you please check what is going on?

from sklearn.datasets import make_blobs
from selector.methods.partition import GridPartition

# generate n_sample data in 2D feature space forming 3 clusters
X, labels = make_blobs(n_samples=500, n_features=2, centers=2, random_state=42)

selector = GridPartition(numb_bins_axis=5, grid_method="equisized_dependent")
selected = selector.select(X, size=50, labels=labels)
print(len(selected))
FanwangM commented 4 months ago

The current code coverage is

  Name                               Stmts   Miss  Cover   Missing
  -----------------------------------------------------------------------------
  selector/methods/partition.py      201      4    98%   375, 407, 520, 619