Open corybrunson opened 4 years ago
Thanks for the ideas as always. I agree with the most of the sentiments. There's actually a couple issues at play here.
In particular, I agree with the following sentiments:
When no vertices argument is supplied... [simplextree::degree(s) should] return the degree sequence of the entire graph.
Agreed. I've changed this behavior for the package-level function simplextree::degree
. I also agree with making the module-namespace functions as similar in interface to the package level functions, however sometimes this may not be possible without a lot of added complexity at the C++ level.
suppressMessages(library(simplextree))
st <- simplex_tree(list(1:3, 4:5, 6))
all.equal(st$degree(st$vertices), st %>% degree())
#> [1] TRUE
Created on 2020-10-24 by the reprex package (v0.3.0)
I haven't figured a way to do this at the C++ level.
- It makes sense to me for st$adjacent(vertices) and adjacent(st, vertices) to have the same behavior, in particular to ease code translation between R6 method-chaining and base R function-nesting (or pipe-based method-chaining). Between the two, i prefer the more flexible wrapper behavior, which can handle one or multiple vertices...
- In case length(vertices) == 1, i suggest that the operation still return a (length-1) list rather than a vector. This would make scripting much easier, in that the structure of the returned value would not depend on the length of the input vector.
Agreed and agreed. The package level adjacent
now returns the adjacencies for every vertex is not specified, and the special case where length(vertices) == 1 has been corrected to return a list.
suppressMessages(library(simplextree))
st <- simplex_tree(list(1:3, 4:5, 6))
all.equal(st$adjacent(1), st %>% adjacent(1))
#> [1] TRUE
all.equal(st$adjacent(st$vertices), st %>% adjacent())
#> [1] TRUE
Created on 2020-10-24 by the reprex package (v0.3.0)
When some vertices are not valid vertex IDs, the simplextree function returns a vector with 0s in those positions, possibly misleading the user. Again i think igraph's behavior is preferred... Similarly to degree(), i think the operation should fail if any vertex IDs are invalid rather than return numeric(0) or a list with numeric(0) entries.
This issue is actually multifaceted I think, but in summary that aspect of the design was intentional. A small factor was that non-throwing functions can be optimized on the C++ level--but the largest factor for choosing to not throw on inputs outside the complex was actually just to maintain consistency with the rest of the package interface.
As a small digression, over the years as C++ has evolved, one of the things that came to light was that exceptions are not always great things. Fun fact: some companies even banned the use of exception handling all together. This primary reason for this is easy to illustrate.
In simplextree
, part of the original design was to make functions which modify the complex idempotent, where possible, and to really simplify building complexes by hand to make interactive use feasible. For example, compare this code
st <- simplextree()
st %>% insert(1:3)
with this code
st <- simplextree()
st %>% insert(list(1, 2, 3, c(1,2), c(2,3), c(1,3), c(1,2,3)))
They both do the same thing, but there are actually a number of TDA packages I've use that require the latter pattern that faces be inserted prior to their corresponding cofaces, otherwise an exception is thrown. And if that pattern was followed here, then the first code example should throw.
Ok thats just the property of being a simplicial complex though, one might argue. But the same idea applies to remove
as well. Should this function throw?
st <- simplex_tree(list(1:3, 4:5, 6))
st %>% remove(c(20, 21))
Clearly the edge being removed isn't in the complex. Suppose that, instead of returning invisibly, an exception was thrown because the operation is nonsensical. Now if a user wants to scaffold their own code using remove
and they want to protect against against exceptions, they either need to write their code very carefully such that they never accidentally remove a simplex not in the complex (e.g. st %>% remove(1:2); st %>% remove(1)
will already fail), or could write a wrapper function:
safe_remove <- function(complex, simplices){
simplices_that_exist <- complex %>% find(simplices)
complex %>% remove(simplices[simplices_that_exist])
}
But what's the complexity of this function? You have to traverse the trie twice, so its twice the number of operations than the equivalent code which just removes the simplex if it exists. On top of that, you run into the possible catastrophe given in the nightmare scenario above.
So the counter argument to that is that well let's just add exception checking on function that actually return results, not on functions that modify the data structure. And in some cases, throwing exceptions is extremely useful for e.g. input checking when there really is no alternative to return. A good example of this is the nat_to_sub
bijection, which is only defined for a fixed domain + codomain, and has undefined behavior otherwise. degree
and adjacent
fall under this category, as does find
, and when considering those functions in isolation, independent from the rest of the package, one could make a strong argument that the inputs should be checked and exceptions should be thrown for "invalid" ones. But to keep things consistent in the package, and to keep things streamlined for people who are aware of the the fact that non-existent simplices have 0/empty-list degrees/adjacencies, I'm not convinced its really cleaner to use exceptions here.
Besides, this is somewhat common to do in functional programming. A trivial example would be something like:
sum(st %>%> degree(seq(100)))
## vs.
sum(st %>%> degree(st$vertices[st$vertices < 100]))
The first takes advantage of the default behavior to simplify the code, whereas the latter is more complex but needed nonetheless if exceptions were thrown for invalid vertices. R is a somewhat functional language. How does R handle this?
do.call(rbind, list(NULL, c(1, 2), c(3, 4), NULL))
As a last note, this sentiment as a whole follows lessons learned in the C++ community involving non-existent keys in mapping data structures. If you have a mapping data structure and you want the value associated with a key, but the keys doesn't exist, many have concluded that it's better to return something that is implicitly casteable to false
which may or may not contain a value. The reason for this is that it lets the user assume default functionality that he/she can then take advantage of at times, e.g. knowing that non-existent keys yield results castable to false
let one compose transform functions that are valid for any input. This is one of the idea of the std::optional
structure in C++. For a fun example, see cropping cat photos.
Though i don't always say it, i always appreciate your thorough and insightful explanations!
I'm swayed on the preferability of recognizable returns over error throws in the C++ context. Importantly for conceptual consistency, i think, both the 0
s in the vector returned by degree(st, v)
and the numeric(0)
s in the list returned by adjacent(st, v)
have reasonable interpretations as the numbers of edges having v[[i]]
as a face and the sets of vertices incident to an edge also incident to v[[i]]
.
One option might be to have a parameter like safe
that defaults to FALSE
but, if TRUE
, performs the additional traversal. Though i actually think it makes more sense to urge the user (in these methods' documentations) to include a find()
step when a script is desired to throw errors like those of igraph, since then the code more clearly communicates what it does.
I'll leave the issue open as a flag for the module-namespace $degree()
method, but feel free to close it if you feel that it can be put to rest for now. Thank you for the upgrades!
Hi again! I wanted to flag some behavior of the vertex-level operations
degree()
andadjacent()
i think is counterintuitive and would suggest changing slightly—though i haven't carefully checked their possible downstream effects. The reprex below illustrates, and i re-installed simplextree from GitHub. I use igraph as a comparison in both cases because the package is widely-used and it's UI has had a while to mature.The
degree()
operation works the same way as an R6 method and as a wrapper, and in practice it is similar to the function of the same name in igraph. Two key behaviors are different, though:vertices
argument is supplied,simplextree::degree(s)
throws an error; i think the preferred behavior in this case is igraph's, in whichigraph::degree(g)
returns the degree sequence of the entire graph.vertices
are not valid vertex IDs, the simplextree function returns a vector with0
s in those positions, possibly misleading the user. Again i think igraph's behavior is preferred: an error is thrown when any vertex IDs are invalid. (Though i'm most persuadable on this point.)The
adjacent()
operation actually works differently as an R6 method and as a wrapper.st$adjacent(vertices)
andadjacent(st, vertices)
to have the same behavior, in particular to ease code translation between R6 method-chaining and base R function-nesting (or pipe-based method-chaining). Between the two, i prefer the more flexible wrapper behavior, which can handle one or multiple vertices.degree()
, i think the operation should fail if any vertex IDs are invalid rather than returnnumeric(0)
or a list withnumeric(0)
entries.length(vertices) == 1
, i suggest that the operation still return a (length-1) list rather than a vector. This would make scripting much easier, in that the structure of the returned value would not depend on the length of the input vector.If we can agree to an appropriate set of behaviors, then i'll be glad to make the minor changes and submit a PR!
Created on 2020-10-24 by the reprex package (v0.3.0)