theochem / Selector

Python library of algorithms for selecting diverse subsets of data for machine-learning.
https://selector.qcdevs.org
GNU General Public License v3.0
22 stars 22 forks source link

Refactor and Add methods in Grid Partitioning #162

Closed Ali-Tehrani closed 1 year ago

Ali-Tehrani commented 1 year ago

This pull request is regarding the grid partitioning methods. These partition space in bins and points are selected based on different bins at a time. There are two types of partitioning schemes, equisized which partitions such that bins have the same size, and equifrequent which partitions space so that each bin has approximately the same number of points. In each scheme, one can do independent and dependent, where in the latter, it partition in each dimension depends on the previous dimensions. There are examples explained in further detail in reference.

The following changes were made:

The following tests were made:

In accordance to issue #156, I've added a simple fix until the pull-request #138 is merged.

codecov[bot] commented 1 year ago

Codecov Report

Merging #162 (16c26ee) into main (4eb670d) will increase coverage by 9.46%. The diff coverage is 98.80%.

:exclamation: Current head 16c26ee differs from pull request most recent head 17b8aa8. Consider uploading reports for the commit 17b8aa8 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #162      +/-   ##
==========================================
+ Coverage   85.57%   95.03%   +9.46%     
==========================================
  Files          13       13              
  Lines        1026     1028       +2     
==========================================
+ Hits          878      977      +99     
+ Misses        148       51      -97     
Files Coverage Δ
DiverseSelector/diversity.py 99.28% <100.00%> (ø)
DiverseSelector/methods/partition.py 97.85% <98.76%> (+42.00%) :arrow_up:
FanwangM commented 1 year ago

Thanks for the improvement. @Ali-Tehrani

I have rebased the codes in order to fix the failing test. A patch/update for the calculation of diversity is also added. Sorry for force push. I will do the code review later.