jdtuck / fdasrvf_R

Functional Data Analysis using Square-Root Slope Framework
10 stars 13 forks source link

Mv kmeans #18

Closed astamm closed 1 year ago

astamm commented 1 year ago

This PR provides extension of the k-means alignment algorithm to multivariate functional data, i.e. data with multi-dimensional codomain. It makes minimal modifications to a number of functions for this extension to happen. Also updates testthat version along the way and a number of documentation improvements. Details are in the commits.

jdtuck commented 1 year ago

Thank you for the pull request and updating docs, was going to get there, but not enough hours in a day. In a lot of cases multidimensional can be treated as open curves in R^n, which the curve code handles, part of this is duplicating that work over into the 1-D functional case. Do you think this is necessary?

codecov[bot] commented 1 year ago

Codecov Report

Merging #18 (ed899a3) into master (12d2671) will increase coverage by 2.40%. The diff coverage is 52.61%.

@@            Coverage Diff            @@
##           master     #18      +/-   ##
=========================================
+ Coverage    5.10%   7.51%   +2.40%     
=========================================
  Files         166     166              
  Lines       15284   15456     +172     
=========================================
+ Hits          780    1161     +381     
+ Misses      14504   14295     -209     
Impacted Files Coverage Δ
R/AmplitudeBoxplot.R 0.00% <ø> (ø)
R/PhaseBoxplot.R 0.00% <ø> (ø)
R/SqrtMean.R 0.00% <ø> (ø)
R/SqrtMedian.R 0.00% <ø> (ø)
R/align_fPCA.R 0.00% <ø> (ø)
R/calc_shape_dist.R 0.00% <ø> (ø)
R/curve_geodesic.R 0.00% <ø> (ø)
R/curve_karcher_cov.R 0.00% <ø> (ø)
R/curve_karcher_mean.R 0.00% <0.00%> (ø)
R/curve_pair_align.R 0.00% <ø> (ø)
... and 45 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

astamm commented 1 year ago

I did realise that R^d-valued functional data are already handled by some of the functions. I just thought that it could be nice to have the kmeans function also with this feature. It was relatively easy since it was limited to 1-d functional data because it forced the dimension to 1 in the C routines. If you accept the PR, I also plan to make the fdacluster package, formerly known as fdakma, depend on yours to include kmeans for functions defined on a fixed interval [a,b].

About the doc update, it comes from the fact that I got confused many times with notations. Could not clearly picture in which dimension of the array to put the grid size or the sample size. So after a number of trials, I got it right and thought of improving the doc. I can do more if you want.

I was also thinking I could be of some help with improving code coverage. Could be good to switch from Travis CI to GitHub actions as well.

And I had some thoughts about the way parallelization is performed. I am a big fan of the futureverse framework. If you are interested, I could also contribute in this direction.

astamm commented 1 year ago

It seems the Travis CI is working under an old R version labelled as current release. My use of the pipe operator which appears in R 4.1.x fails. I can also not use the pipe operator which should solve this issue. But maybe it could be worth upgrading the release and devel versions of R on Travis.

jdtuck commented 1 year ago

Sounds good and I am good with your plan.

jdtuck commented 1 year ago

Also, we can update travis to fix the pipe error, feel free to make edits to the yml file

jdtuck commented 1 year ago

I appreciate any help on making the docs and going to github actions. All on my list, but higher priority things move them down. Thank you for all you have done in this PR. Also if we upgrade travis, lets upgrade which R is the minimum version to avoid any issues there.

astamm commented 1 year ago

If you are ok with that, we could switch to GHA and remove travis CI. There are ready-to-go GHA scripts with up-to-date R-release and R-devel versions that test on a variety of OSes. Let me know. Could be a separate PR to keep things clearer. Same holds for doc improvements.

jdtuck commented 1 year ago

I am good with that, do that as a separate PR and doc improvements on a separate PR. Leave the ones you have done so far in this PR.

astamm commented 1 year ago

Good for me. I removed the pipe operator so this PR should pass the current Travis config.

jdtuck commented 1 year ago

Looks like you still have one

Error in parse(outFile) : 

  /home/travis/build/jdtuck/fdasrvf_R/fdasrvf.Rcheck/00_pkg_src/fdasrvf/R/kmeans.R:325:18: unexpected '>'

324:     tbl <- table(out$labels)

325:     cols <- idx |>
jdtuck commented 1 year ago

Though code looks good, travis must not have repulled correctly

jdtuck commented 1 year ago

Looks to be an issue with data.R fixing now

astamm commented 1 year ago

It seems it is still not happy with documentation which uses \eqn{} which used to transfer correctly to html version of the manual but not PDF. This is the reason I believe why Travis still complains. Newer versions of R now handles \eqn{} properly also for the PDF version of the manual. I could contribute in this PR the switch to GitHub actions with up-to-date R-release and -devel versions to confirm there are no relevant issues with the modifications done in this PR.

astamm commented 1 year ago

OK. This PR can be merged. I will contribute two other PRs: one for improving code coverage and one for transitioning to GHA for CI. Also code coverage calculation can be automated via GHA. Do you want me to add that as well?

jdtuck commented 1 year ago

Merging now and will wait for the PRs before next CRAN release