Closed peekxc closed 4 years ago
Merging #19 into master will decrease coverage by
1.08%
. The diff coverage is76.59%
.
@@ Coverage Diff @@
## master #19 +/- ##
==========================================
- Coverage 93.37% 92.29% -1.09%
==========================================
Files 6 6
Lines 589 584 -5
==========================================
- Hits 550 539 -11
- Misses 39 45 +6
Impacted Files | Coverage Δ | |
---|---|---|
src/ripser_short.cpp | 92.03% <73.91%> (-0.42%) |
:arrow_down: |
R/calculate.R | 87.50% <79.16%> (-9.56%) |
:arrow_down: |
R/inference.R | 93.67% <0.00%> (-0.88%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0700bff...d2e2337. Read the comment docs.
Wow @peekxc this is fantastic!! Both the support for the new p
parameter (and supporting calculations) and supporting dist
objects (and the associated tests) are extremely helpful. Sorry it's taken me so long to get to this.
One of the things I've been meaning to do for a while is split TDAstats up - ggtda already deals w/ visualization and tools for statistical inference are being added w/ Shael Brown's help, so my thought was to separate the calculation aspect of TDAstats into another R package (ripserr). In this manner, I could also add cubical ripser (maybe also adding calculation of extended persistence) to ripserr w/ a workstream independent of the rest of TDAstats (and independent releases to CRAN as well). In general, it also seems easier to have compartmentalized packages w/ one major function each (in this case, TDAstats = inference for PH, ggtda = viz for PH, and ripserr = calculation of PH).
With this in mind, in addition to merging this PR into TDAstats (will do shortly after a few confirmatory tests; although functionality will eventually be deprecated in favor of ripserr), I have also ported your code over to rrrlw/ripserr (branch = fromtdastats) in the latest commit (still porting tests over). Is this okay with you? I will be giving you push access to the ripserr repo shortly - feel free to correct any of my mistakes there. Added to you ripserr DESCRIPTION as an author, feel free to correct any misspellings, etc. I made.
Thanks again for this contribution! Looking forward to hearing what you think.
Hey @rrrlw, just to double-check: goal of ripserr is just to provide engine for PH effectively, and move TDAstats towards more modular sort of interface that may in the future accept options for multiple engines doing persistent homology, correct?
Goal of ripserr is as you stated (Ripser + Cubical Ripser + maybe other engines e.g. Flagser). I definitely want to make TDAstats more modular/single purpose as it was getting tricky to maintain in a single package, but goal is to have it contain statistical/inference stuff for TDA (which could be persistent homology output from many engines, maybe even stats for Mapper stuff although I am very much a newbie to that). @corybrunson and I were thinking of another package (tentatively named perrrsist) that would serve as an interface to persistent homology (and other?) engines (analogous to parsnip in tidymodels).
P.S. - although I merged the fromtdastats branch into master in rrrlw/ripserr, if there's a better way of incorporating your code, I'm more than happy to reverse/edit!
TDAstats supports as input to
calculate_homology
a point cloud or distance matrix, indicated by the givenformat
parameter. This PR modifies it to supportdist
objects natively, both at the interface level and at the C++ level.Currently, if a standard
dist
object is passed in anddistmat
is not specified, thencalculate_homology
assumes it's a 1-dimensional point cloud, yielding different barcodes than as if one were to pass the corresponding distance matrix.More ever, if the
distmat
option is specified, thedist
object is converted to a distance matrix viaas.matrix
, only to then again be converted again in C++ back to its lower triangular form;as.matrix
applied todist
objects copies every distance 2x, increasing the amount of memory needed to pass in adist
object by 2. To fix this, a new functionripser_cpp_dist
is added to the source that acceptsdist
vectors natively, and minimizes unnecessary copying bystd::move
s them intocompressed_upper_distance_matrix
's (since R is column-major and Ripser uses row-major indexing, thecompressed_upper_distance_matrix
in Ripser is equivalent to lower triangular portion of the distance matrix in R, and vice versa.Here's the reprex showing the new interface:
Created on 2020-06-23 by the reprex package (v0.3.0)
The only thing that's important is pers.hom2. It used to give different results than the rest.