Closed diegodoimo closed 8 months ago
Thanks @diegodoimo! These are in fact two different estimators, both based on the binomial distribution but one working in continuous spaces and the other in discrete spaces.
The "Data" class only has access to the continuous estimator, since it inherits from Clustering but not from IdDiscrete. To access the discrete estimator you need to instantiate an IdDiscrete class and call its methods.
Currently I think this makes sense, since for discrete spaces we do not have many algorithms implemented. In the future we might think of ways of integrating all methods under a single framework.
@imacocco any thoughts on this?
That's precisely how you pointed it out, I named the functions the same way because they are conceptually identical but operate in two different frameworks. At the moment the routines for the discrete estimator do not interact with those in the continuum and have to be called explicitly by the module id_discrete.py. We still might want to name them differently if this can be confusing
Ok. Personally, I would change the name of one of the two anyway. I believe that two different functions, however similar they may be to each other, should have different names. Moreover, if in the future we will integrate it with the rest, maybe we will have to do it in any case.
I have another issue with the discrete implementation, which I just tested: if I do the following:
from dadapy.id_discrete import IdDiscrete as ie d = ie(distances = (dist, dist_index)) idstmp, , _ = d.compute_id_binomial_k(k = 3)
I got the following error: "AssertionError: select a proper ratio 0<r<1"
I expect to be able to run the function correctly as I have initialized the class correctly and called the function with all the required mandatory arguments (k, only in this case)
Just fixed the problems you mentioned, now it should work properly. Waiting for feedback before closing the issue
Good for me. Let's wait for @AldoGl
Thanks for the change @imacocco. I see you simply set the default to r=0.5. This is ok, but have you considered setting r to the optimal value provided my the minimisation of the Cramer-Rao bound?
The problem is that r_opt depends on the intrinsic dimension, in such case we should write an iterative function. And it is well defined in the continuum, on the discrete we might just extrapolate such behaviour
Yes you are totally right! That would require some kind of iteration and it's not to be included there. Sorry :)
Subject of the issue
The compute_id_binomial_k method is defined twice: once in the id_discrete.py module and the other in the id_estimation.py module. The two functions take different numbers/types of inputs and the implementation in the id_discrete.py module is overridden by that in id_estimation.py.
Steps to reproduce
from dadapy.data import Data import numpy as np
x = np.random.normal(size = (100, 2)) d = Data(coordinates = x) idstmp, , _ = d.compute_id_binomial_k(k = 3)
Expected behavior
Correctly compute the ID with the discrete routine.
Actual behavior
TypeError: IdEstimation.compute_id_binomial_k() missing 1 required positional argument: 'r'