schung039 / neural_manifolds_replicaMFT

68 stars 21 forks source link

Analysis fails for small P #5

Closed PierreOrhan closed 3 years ago

PierreOrhan commented 3 years ago

Hello, thank you for this great work!

In my hand the analysis seems to fail for small P. reusable example:

np.random.seed(0) X = [np.random.randn(5000, 50) for i in range(3)] kappa = 0 n_t = 200 capacity_all, radius_all, dimension_all, center_correlation, K = manifold_analysis_corr(X, kappa, n_t)

Which leads to an assertion error in fun_FA

This seems to come from the setting of maxK (line 62 of manifold_analysis_correlation): maxK = np.argmax([t if t < 0.95 else 0 for t in total]) +11 as changing it to maxK = np.argmax([t if t < 0.95 else 0 for t in total]) +1 solves the issue for P>2

The issue remains for P=2.

Overall is the technique adaptable for comparison of 2 manifolds, and similarly, does it make sense to try using it for low number of class (<100)?

Thank you for your help!

schung039 commented 3 years ago

Hi Pierre, the theory used in this algorithm assumes that P is large, but for the technical implementation of the algorithm, there shouldn't be a reason why P should fail for small P. In the case of P=2, you can set maxK = 0 for now. We'll incorporate this issue and push the new code soon.

PierreOrhan commented 3 years ago

Allright, thank you! I will reconsider the way I thought about using it, and set the experiment to be in the case where P is large so that we match theoretical hypothesis!

sydddl commented 8 months ago

Hi Pierre, the theory used in this algorithm assumes that P is large, but for the technical implementation of the algorithm, there shouldn't be a reason why P should fail for small P. In the case of P=2, you can set maxK = 0 for now. We'll incorporate this issue and push the new code soon.

maxK = 0 it will return

res_coeff_opt, KK = min(res_coeff), np.argmin(res_coeff) + 1
ValueError: min() arg is an empty sequence

But I do have the need to calculate the manifold radius of the two classes embedded in the model when P=2. According to my understanding of this method, can I use the embedding results of other types of data that are not actually in the model training class to increase P (from 2 to >100)?

When I tested randomly generated data, I felt that the larger P was, the more stable the calculation results were. Especially when changing one of the manifold data, when P=3, the results of other manifolds changed greatly.