theochem / Selector

Python library of algorithms for selecting diverse subsets of data for machine-learning.
https://selector.qcdevs.org
GNU General Public License v3.0
22 stars 22 forks source link

New average similarity method #189

Closed marco-2023 closed 5 months ago

marco-2023 commented 11 months ago

This PR should close (for now) the implementation of n-ary similarity methods.

Addressed problems

  1. Added support to the Instant Similarity ( isim)
    • Implemented support for Instant Similarity (isim) methods within the NSimilarity and SimilarityIndex classes.
    • The previous method for calculating average similarity is now called Extended Similarity ( esim) and is supported for "backward compatibility?"
  2. Added support for esim and isim to utilize a p-norm average similarity.
  3. Added tests for the isim method with the different similarity indexes and p-norm averages using reference data generated by Ramón's group code

Remaining things to do (future)

  1. We don't have a source of reference data for esim with p-norm averages different than 1.
  2. Improve docstrings format

Notes: After this is merged, the Quickstart notebook (PR #186) can showcase the new flexibility of the methods.

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 89.18919% with 8 lines in your changes missing coverage. Please review.

Project coverage is 96.00%. Comparing base (dbc2202) to head (cb48826). Report is 3 commits behind head on main.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/theochem/Selector/pull/189/graphs/tree.svg?width=650&height=150&src=pr&token=0UJixrJfNJ&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theochem)](https://app.codecov.io/gh/theochem/Selector/pull/189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theochem) ```diff @@ Coverage Diff @@ ## main #189 +/- ## ========================================== - Coverage 96.21% 96.00% -0.22% ========================================== Files 9 9 Lines 951 975 +24 ========================================== + Hits 915 936 +21 - Misses 36 39 +3 ``` | [Files](https://app.codecov.io/gh/theochem/Selector/pull/189?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theochem) | Coverage Δ | | |---|---|---| | [selector/methods/similarity.py](https://app.codecov.io/gh/theochem/Selector/pull/189?src=pr&el=tree&filepath=selector%2Fmethods%2Fsimilarity.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theochem#diff-c2VsZWN0b3IvbWV0aG9kcy9zaW1pbGFyaXR5LnB5) | `95.13% <89.18%> (-0.59%)` | :arrow_down: | ... and [7 files with indirect coverage changes](https://app.codecov.io/gh/theochem/Selector/pull/189/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theochem)