sissa-data-science / DADApy

Distance-based Analysis of DAta-manifolds in python
https://dadapy.readthedocs.io/
Apache License 2.0
103 stars 18 forks source link

kstar computation is mandatory when calling PAk #120

Closed diegodoimo closed 6 months ago

diegodoimo commented 6 months ago

I made the kstar computation mandatory in PAk. I believe that not doing so results in wrong outputs when kstar has been set before since the current PAk implementation does not recompute kstar if this attribute is already set. For instance, if we call "compute_density_kNN() ", then we are forced to call "compute_kstar()" before calling compute_density_PAk() to make sure that we do not use a constant k value in PAk. By definition PAk requires the computation of kstar, this is its main job and purpose. Not doing so, can produce a weird hybrid algorithm in which the free energy is optimized over a set of ks that do not belong to the philosophy of the algorithm. For this reason, I suggest making the computation of kstar mandatory. This is a cheap part computationally speaking, and the overhead for the (rather rare) cases in which the user calls explicitly the "compue_kstar" method before pak are negligible.

codecov-commenter commented 6 months ago

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 81.32%. Comparing base (f073bf4) to head (d94654a). Report is 4 commits behind head on main.

Files Patch % Lines
dadapy/density_estimation.py 50.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #120 +/- ## ========================================== - Coverage 81.61% 81.32% -0.29% ========================================== Files 20 20 Lines 3285 3299 +14 ========================================== + Hits 2681 2683 +2 - Misses 604 616 +12 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

AldoGl commented 6 months ago

I would be more in favour of raising a warning if all ks are constant. I believe one of the nice features of DADApy is the possibility of "mixing and matching" different algorithms. For instance, one might want to assess the performance of the PAk estimator with a specific choice of k, say for a constant k ranging from 10 to 1000. That said, since I am not a big user of these algorithms I would leave the choice to you, and perhaps to @charliematteo

alexdepremia commented 6 months ago

Indeed, I agree with Diego, PAk has no scope without first computing k-star. However, I'm not convinced about making mandatory the call to compute k-star. For instance, I'm working on an alternative way to compute k-star and I will like to introduce that in DADApy soon. I would go for Guido's suggestion of maintaining the flexibility of the code.

charliematteo commented 6 months ago

I also agree with @AldoGl to just raise a warning but leaving it more flexible for the users