jdtuck / fdasrvf_R

Functional Data Analysis using Square-Root Slope Framework
11 stars 14 forks source link

Karcher mean #40

Closed astamm closed 7 months ago

astamm commented 8 months ago

A new version for multivariate_karcher_mean() which brings back the possibility to use a rotation- and/or scale-invariant metric.

The function is parallelized via the future framework. It is handy because you do not have to add any extra parameter to the function. It is parallelized automatically whenever the user chooses to parallelise calculations within his R session via future::plan(future::multisession, workers = 8) for example.

One point to improve is the case of median under scale invariance. I am pretty sure I do not use the correct distance.

jdtuck commented 8 months ago

I do not like future right now as most users are not using it, would like to keep the parallel option

jdtuck commented 8 months ago

Also if you add back in rotation and scale how is this different than curve_karcher_mean?

astamm commented 7 months ago

I removed the future framework and used doParallel as you do in other functions.

I compared the output of multivariate_karcher_mean() and curve_srvf_align() and they are not the same. I do not really understand what curve_karcher_mean() does actually. It seems to always project curves on sphere and then use cos() and sin() to compute the mean which I do not understand. I find multivariate_karcher_mean() easy to follow and I am more happy about the mean it actually returns.

I think we need to document somewhere why there is need to call internally calculatecentroid(). What is this for from a theoretical point of view?

Also, I am thinking about bringing back in the mode as well. Is multidimensional function with first and last point equal to be considered a closed curve? To that end, I tried once to set mode == "C" in curve_srvf_align() but it errored...

Finally, once we agree on the above, I'll fix the median computation.

jdtuck commented 7 months ago

So I do not agree with just comparing outputs. curve_karcher_mean is the correct formulation for open and closed curves. Please see 2011 paper by Anuj and 2016 book. If you are wanting a mean computation under L2 metric for a specific case, that is what multivariate_karcher_mean was created for and should stay there. I am looking at what you are doing and it is incorrect for open and closed curves. The sin and cos are important due to the geometry.

See Algorithm 39,40,41, and 46 in 2016 Book by Anuj and Klassen

The point of moving the centroid is purely numerics as it makes sure all data is centered and you are not working with any input data that has an odd translation.

jdtuck commented 7 months ago

Also what what do you mean it “errored”, your input curves have to be closed if you use that mode. We have been using it for a long time.

astamm commented 7 months ago

So I do not agree with just comparing outputs. curve_karcher_mean is the correct formulation for open and closed curves. Please see 2011 paper by Anuj and 2016 book. If you are wanting a mean computation under L2 metric for a specific case, that is what multivariate_karcher_mean was created for and should stay there. I am looking at what you are doing and it is incorrect for open and closed curves. The sin and cos are important due to the geometry.

See Algorithm 39,40,41, and 46 in 2016 Book by Anuj and Klassen

I think the purpose of multivariate_karcher_mean() is indeed to exploit the fact that the SRSF of a function is $L^2$ by construction, as you explain in your 2013 CSDA paper. So we can define $|| f_1 - f_2 || := || q_1 - q2 ||{L_2}$. This definition of distance is convenient as it makes the Frechet mean of the $q_i$'s straightforward (pointwise mean). By the way what are the exact assumptions for a function to have an SRSF? Shouldn't it be continuous and admit a first derivative?

Then one can include re-parametrization and rotation invariance by minimising properly this distance over the correct set (of rotations or of diffeomorphisms). It then defines a semi-metric in the sense that d(f_1, f_2) = 0 does not imply that the two functions are equal but that they belong to the same class of equivalence (which I believe you refer to as an orbit).

Also, from a package user perspective, I do not understand the need to have one function for Karcher mean of 1D functional data and another for the multidimensional case. We could have just one which picks the optimisation algorithm depending on the dimension of the co-domain.

The point of moving the centroid is purely numerics as it makes sure all data is centered and you are not working with any input data that has an odd translation.

Ok. So the centroid should be added back at the end of the iterative procedure then? If so, I see easily how to do so for the original functions but not that clear which center should be added to the Karcher mean. That is why currently the output functions are all centered according to calculatecentroid() and the results seem satisfactory.

astamm commented 7 months ago

Also what what do you mean it “errored”, your input curves have to be closed if you use that mode. We have been using it for a long time.

Thanks for the book references. I definitely think then that we as in "people working with functional data" are not in this framework. I quickly looked at the references you pointed out and in your setting, you are projecting systematically SRSFs on an hypersphere so you are not exploiting the $L^2$ property of the SRSF space. Does that mean that this is generally a bad idea in shape analysis to stick the the L2 geometry of the SRSF space and better to use the geometry of the hypersphere? Just a question to improve my understanding :-)

About the error, I could give you a reprex but maybe it is just my data which are not closed curves... I wanted to try because the first and last value of my curves are the same. Hence on a 3D plot, curves are "closed". But it might not be the definition of a closed curve.

astamm commented 7 months ago

In fact, what bugs me (probably because of my poor understanding) is that the SRSFs are systematically normalised in the curve_*() functions, hence projected on hypersphere while I would have expected this only when scale == TRUE.

jdtuck commented 7 months ago

So I do not agree with just comparing outputs. curve_karcher_mean is the correct formulation for open and closed curves. Please see 2011 paper by Anuj and 2016 book. If you are wanting a mean computation under L2 metric for a specific case, that is what multivariate_karcher_mean was created for and should stay there. I am looking at what you are doing and it is incorrect for open and closed curves. The sin and cos are important due to the geometry. See Algorithm 39,40,41, and 46 in 2016 Book by Anuj and Klassen

I think the purpose of multivariate_karcher_mean() is indeed to exploit the fact that the SRSF of a function is L2 by construction, as you explain in your 2013 CSDA paper. So we can define ||f1−f2||:=||q1−q2||L2. This definition of distance is convenient as it makes the Frechet mean of the qi's straightforward (pointwise mean). By the way what are the exact assumptions for a function to have an SRSF? Shouldn't it be continuous and admit a first derivative?

Then one can include re-parametrization and rotation invariance by minimising properly this distance over the correct set (of rotations or of diffeomorphisms). It then defines a semi-metric in the sense that d(f_1, f_2) = 0 does not imply that the two functions are equal but that they belong to the same class of equivalence (which I believe you refer to as an orbit).

Also, from a package user perspective, I do not understand the need to have one function for Karcher mean of 1D functional data and another for the multidimensional case. We could have just one which picks the optimisation algorithm depending on the dimension of the co-domain.

The point of moving the centroid is purely numerics as it makes sure all data is centered and you are not working with any input data that has an odd translation.

Ok. So the centroid should be added back at the end of the iterative procedure then? If so, I see easily how to do so for the original functions but not that clear which center should be added to the Karcher mean. That is why currently the output functions are all centered according to calculatecentroid() and the results seem satisfactory.

So you again are conflating theories. We are dealing with multivariate data so the 2013 paper does not apply. That is only for functions in R^1!. Also I am going to ask have you read the all the papers and the 2016 book by Anuj and Klassen? Your question is fundamental. We assume that the curves are absolutely continuous in R^n for the SRVF to exist.

You can include rotation, but as was demonstrated in the papers in 2011 and 2012 by Anuj and Kurtek and the 2016 book by Anuj that one does not want to do that on L2 metric usually and we want an elastic metric and we go to the preshape space of the Hilbert sphere for open curves and \mathbb{S}_c for closed curves. This is WHY the package is coded this way. All of it is around the elastic metric in the proper space! I am not going to change this. From my perspective I want a separate function for multivariate data as it is slightly different from functions in R^1. The proper metric for the case of multivariate functional data (R^2) is L2 if you do mod out rotation and scale. I coded this function for this very explicit case. Per the users of the package and groups that I support we will keep it this way. We could do a complete refactor as you suggest, but this will be very large and a lot of breaking changes for users. If I was going to rewrite from the ground up after ALL the theory was developed I would have done that. This package was developed as the theory progressed.

The centroid could be added back in, but we have decided that is left to the user and will stay that way.

jdtuck commented 7 months ago

Also what what do you mean it “errored”, your input curves have to be closed if you use that mode. We have been using it for a long time.

Thanks for the book references. I definitely think then that we as in "people working with functional data" are not in this framework. I quickly looked at the references you pointed out and in your setting, you are projecting systematically SRSFs on an hypersphere so you are not exploiting the L2 property of the SRSF space. Does that mean that this is generally a bad idea in shape analysis to stick the the L2 geometry of the SRSF space and better to use the geometry of the hypersphere? Just a question to improve my understanding :-)

About the error, I could give you a reprex but maybe it is just my data which are not closed curves... I wanted to try because the first and last value of my curves are the same. Hence on a 3D plot, curves are "closed". But it might not be the definition of a closed curve.

I would say some people are not in this framework, others are. And there is a difference between SRSF and SRVF. Hence the difference in the package. Generally for any curve in R^N where N > 1 this is a BAD idea. See comment above for a bit more. But the point of all this work is that we want to use the elastic metric in the proper space.

I need to see your data to answer this question on closed curves

jdtuck commented 7 months ago

In fact, what bugs me (probably because of my poor understanding) is that the SRSFs are systematically normalised in the curve_*() functions, hence projected on hypersphere while I would have expected this only when scale == TRUE.

We do this for the numerics of computing the optimal warping function. The c++ code expects a normalized SRVF which makes the numerics more accurate. Scale does not effect the solution of the warping function.

astamm commented 7 months ago

You can include rotation, but as was demonstrated in the papers in 2011 and 2012 by Anuj and Kurtek and the 2016 book by Anuj that one does not want to do that on L2 metric usually and we want an elastic metric and we go to the preshape space of the Hilbert sphere for open curves and \mathbb{S}_c for closed curves.

I did read the papers from 2011 and 2012 (no not the entire book from 2016). Specifically, I am referring to Table 1 from the 2012 paper which defines the prespace as $L^2$ even if we include rotation. In fact prespace is $\mathbb{S}_\infty$ only when we seek scale invariance, which matches my understanding. Am I missing something here?

jdtuck commented 7 months ago

Yes table 1 in that paper points out the spaces. The focus of this package is on the bottom row. If there is any expansion I would only want to do the upper left. If you are wanting to do the entire top row, we need to be very explicit in the documentation as users will just use this based on file name and not understand which metrics and then will complain when the results don't match. Kurtek goes on in that paper to motivate the elastic approach and using Algorithm 1. I really want to avoid confusion between those spaces.

astamm commented 7 months ago

The focus of this package is on the bottom row

The bottom row does not include rotation invariance. So I wonder what is done in the package when we use rotation = TRUE or rotated = TRUE?

I am happy to create a separate package if you think it might clarify for the users. But I need to fully grasp the subtle points it seems I am missing first :-). In fact, I originally was misled by the name of the package fdasrvf which let the user think it is dedicated to fda, a.k.a functional data analysis, while it is not restricted to that and even mainly developed for something else.

I still think that a single package could handle all 4 feature spaces.

jdtuck commented 7 months ago

It is dedicated to FDA, I am very offended by that remark actually and no we should not create another package, but should be careful about how we name things here if we go this route. Please see the book https://link.springer.com/book/10.1007/978-1-4939-4020-2

That is my entire research is in FDA. The bad part is most statisticians don't go beyond the simple definition of R^1 and multivariate functional data. Functional data goes way beyond that. See the work by Dryden, Marron, Srivastava, Tucker, Kurtek and more. Functional Data Analysis is the application of functional analysis and can go beyond functions in R^1!!!

jdtuck commented 7 months ago

You are too focused on one paper and one table and the bottom row does talk about rotation that is the bottom right entry.

jdtuck commented 7 months ago

I think we can handle all four, but its going to have to be a meta function that is documented well and goes out and calls the sub-functions. I have a lot of users that use the other functions so I need to keep functionality. Some people don't read documentation and just apply functions by names and gets them into trouble so I would like to avoid that as much as possible. This function is not going to solve this though.

astamm commented 7 months ago

It is dedicated to FDA, I am very offended by that remark actually and no we should not create another package, but should be careful about how we name things here if we go this route. Please see the book https://link.springer.com/book/10.1007/978-1-4939-4020-2

I apologise, I did not mean to offend you at all. I am really just trying to understand the whole picture here. In particular, my confusion started with the need to create 3 functions to handle 1D functional data, multidimensional data and curves separately. Why do we want to do this if all of these are functional data? This can be really confusing. The type of data is always functional data, which justifies the name of the package. Then, the choice from the user perspective should be the feature space, i.e. does he want to detect changes in scale, rotation, position, etc. Once the user has made this choice, then we as developers should internally use the appropriate space/representation and metric to do all kind of statistical analysis, be it simple distance computation, or PCA or Karcher mean or clustering or others.

On the contrary, by naming one function function_mean_*(), another multivariate_karcher_mean() and another curve_karcher_mean(), it can create confusion (as it did with me) that a curve is intended as something different from a functional datum.

That is my entire research is in FDA. The bad part is most statisticians don't go beyond the simple definition of R^1 and multivariate functional data. Functional data goes way beyond that. See the work by Dryden, Marron, Srivastava, Tucker, Kurtek and more. Functional Data Analysis is the application of functional analysis and can go beyond functions in R^1!!!

I think you are a bit harsh here. Statisticians have been using FDA to deal with multivariate functional data for a long time, so not just functions in $\mathbb{R}$. I agree that shape analysis is a part of FDA though and that inclusion is not always understood.

astamm commented 7 months ago

You are too focused on one paper and one table and the bottom row does talk about rotation that is the bottom right entry.

My bad, yes indeed, you are right. But that (bottom right) means that you systematically assume that scale is on when rotation is on, am I correct ? So you are not authorising a rotation-invariant metric sensitive to scale as in top right panel?

In any event, if I can try to explain where I wanted to go with multivariate_karcher_mean(), despite the name which might not reflect what I intend to do, is to provide a unique function that computes the Karcher mean using any of the 4 feature spaces from Table 1, selected based on input arguments rotation and scale. I do not know whether I am too focused on this table but I find it summarising nicely the different kind of invariance one might be interested in and useful for implementation purposes.

astamm commented 7 months ago

Actually, after this discussion, I think the current implementation of multidimensional_karcher_mean() correctly handles the top row (i.e. assuming scale is off). Do you agree?

If you agree, then, combined with curve_karcher_mean(), we should have all 4 cases covered.

jdtuck commented 7 months ago

It is dedicated to FDA, I am very offended by that remark actually and no we should not create another package, but should be careful about how we name things here if we go this route. Please see the book https://link.springer.com/book/10.1007/978-1-4939-4020-2

I apologise, I did not mean to offend you at all. I am really just trying to understand the whole picture here. In particular, my confusion started with the need to create 3 functions to handle 1D functional data, multidimensional data and curves separately. Why do we want to do this if all of these are functional data? This can be really confusing. The type of data is always functional data, which justifies the name of the package. Then, the choice from the user perspective should be the feature space, i.e. does he want to detect changes in scale, rotation, position, etc. Once the user has made this choice, then we as developers should internally use the appropriate space/representation and metric to do all kind of statistical analysis, be it simple distance computation, or PCA or Karcher mean or clustering or others.

On the contrary, by naming one function function_mean_*(), another multivariate_karcher_mean() and another curve_karcher_mean(), it can create confusion (as it did with me) that a curve is intended as something different from a functional datum.

That is my entire research is in FDA. The bad part is most statisticians don't go beyond the simple definition of R^1 and multivariate functional data. Functional data goes way beyond that. See the work by Dryden, Marron, Srivastava, Tucker, Kurtek and more. Functional Data Analysis is the application of functional analysis and can go beyond functions in R^1!!!

I think you are a bit harsh here. Statisticians have been using FDA to deal with multivariate functional data for a long time, so not just functions in R. I agree that shape analysis is a part of FDA though and that inclusion is not always understood.

I think your confusion is one of the first in this area. I am going to repeat this again. This project has grown as the theory developed, hence its structure. It also was set up to mirror the MATLAB codes first produced by Anuj, Sebastian, and I. If I would start over now and have a different structure, yes. But we are here now with a lot of users used to this structure and naming.

This is a start on the top row, but needs better documentation, better initialization and is only for multivariate data not 1-D. There are specific numerical things in the 1-D code that we do to protect the user that is not covered in this function. But yes combined with curve_karcher_mean covers all four cases.

Again I think at multivariate meta function to handle all four cases would be better, with good documentation on what each means. Doing a whole sale refactor of the package, while great is not a good idea given how this package is being used by multiple grad students and national labs. We can slowly work down that road, but it will have to be very slow.

Also you are still limiting your view on FDA by just saying shape. We can also go to the space of [0,1]^2 and include images and surfaces.

astamm commented 7 months ago

I think your confusion is one of the first in this area.

Equally offended by this remark. I have also dedicated most of my research to FDA and we as a group at the department of mathematics in France and also in my old lab at MOX, Politecnico di Milano, got confused by the namings and the multiple functions.

To be more constructive, I think this is a classic case of package developed by theoretical mathematicians where each function does one mathematical operation in a given space with a given metric. On the contrary, R is primarily targeting applied statisticians for whom a bijection between package functions and statistical methods will generally work better, e.g. a single function for computation of mean for functional data using elastic distance. The applied statistician often does not necessarily have the underlying knowledge to pick the right function in the current API of the package. I mean by that the (s)he does know whether (s)he wants rotation invariance or scale invariance and so on, but (s)he does not necessarily know what kind of preshape space does this imply nor which metric should be used in that space. So I guess the API design mostly depends on the targeted audience, which, when it comes to R packages, are applied statisticians.

I have a similar issue with two packages I maintain: rgudhi and rgeomstats, which are still at their infancy. They make the Python packages GUDHI (for topological data analysis) and GeomStats (for statistics on Riemannian manifolds) available to the R community. I originally (currently) designed the R packages to mimic the Python API. But as time goes by and feedback comes in, I feel the R community is mostly composed of data scientist who want to apply stats methods but do not always have the required maths background to correctly make the subtle choices.

astamm commented 7 months ago

This is a start on the top row, but needs better documentation, better initialization and is only for multivariate data not 1-D. There are specific numerical things in the 1-D code that we do to protect the user that is not covered in this function. But yes combined with curve_karcher_mean covers all four cases.

Many functions of the package require a huge update of their documentation I would say, indeed including this one. This is related to another issue posted by myself and @araiari.

About the initialisation, I will push today the function curve_dist() which computes the pairwise distance matrix from which I compute the medoid which is then used as initialisation. So that will be covered.

As for multivariate VS 1D, well, aren't 1D function multivariate with dimension 1 somehow? Why would it not work for 1D functions? If you can explain, I can adapt the code but as I am not familiar with the numerical tricks, it is hard for me to grasp the difference.

You confirm then that curve_karcher_mean() with rotation ON should provide identical results whether scale is ON or OFF?

astamm commented 7 months ago

Also you are still limiting your view on FDA by just saying shape. We can also go to the space of [0,1]^2 and include images and surfaces.

Aren't images and surfaces shapes? The difference is only the dimension of the domain that changes, isn't it?

astamm commented 7 months ago

You confirm then that curve_karcher_mean() with rotation ON should provide identical results whether scale is ON or OFF?

I would go farther. Since you said that the package focuses on the bottom row of Table 1 in the 2012 paper, then the optional argument scale should be irrelevant, shouldn't it?

jdtuck commented 7 months ago

I disagree that R is applied statistics, it is used by applied statisticians, research statisticians, mathematical statisticians. Yes the documentation needs improvement.

Yes 1-D mathematically is dimension 1 of multidimensional. There are numerical tricks to handle integration error and choosing orbit elements, etc. I don't have time right now to explain all of it and its not for this PR. You are trying understand 14years of work in a PR.

You could call images shape. But there are differences and the distinction is important and for shape we go to Kendall's definition. One could also argue the reverse that 1-D function and N-D functions are shape so it is all "shape".

the option of scale in the curve_karcher_mean currently brings scale back into the final curves and has been in flux since this paper was released. I did have this implementation, but I am not sure I agree with the theory and recently removed it.

Epifanio, I., Gimeno, V., Gual-Arnau, X., and Ibáñez-Gual, M. V. (2020), “A New Geometric Metric in the Shape and Size Space of Curves in Rn,” Mathematics, 8, 1691. https://doi.org/10.3390/math8101691.

astamm commented 7 months ago

Yes 1-D mathematically is dimension 1 of multidimensional. There are numerical tricks to handle integration error and choosing orbit elements, etc. I don't have time right now to explain all of it and its not for this PR. You are trying understand 14years of work in a PR.

Sure, this PR is about Karcher mean but it is all related. My aim is to provide

I think calc_shape_dist() fulfils the first point although I have a doubt now as it becomes unclear what the scale option of that function actually does if the package focuses on scale-invariant metrics.

I have the second function ready (called currently curve_dist() which calls calc_shape_dist() and thus will be OK once we decide calc_shape_dist() is.

And this PR also should go up to the third point and provide a function for Karcher mean computation.

So although the PR is named Karcher mean, all the points discussed here are important to get it right.

jdtuck commented 7 months ago

calc_shape_dist fullfils the first point

astamm commented 7 months ago

I just committed the function which should fulfil the second point.

jdtuck commented 7 months ago

It also should be noted for your awareness, the curve functions do not implement all the optimization functions that are available for 1-D. See elastic.distance and optimum.reparam. Some of the options for optimum.reparam have not been exposed up yet to elastic.distance, but are to time-warping

astamm commented 7 months ago

Thanks for merging. I am still working on all this but will create separate issues and PRs when the time comes.