Open krivit opened 2 years ago
Nice idea. It sounds useful to me especially if there would be some freedom of choice wrt "similarity function" f()
in
f( A[i], A[j] )
where f()
could include, next to your examples:
Implementation-wise, does such term have to essentially construct a matrix for edgecov
on the R level, or there are computational "shortcuts" to exploit on the lower level?
Implementation-wise, does such term have to essentially construct a matrix for
edgecov
on the R level, or there are computational "shortcuts" to exploit on the lower level?
That remains to be seen. If the operation is on a set rather than a mapping, there is a number of ways to represent it, with different advantages and disadvantages:
int
can encode set membership for up to 32 properties. Then, unions and intersections can be calculated by bitwise &
. If there are more than 32 properties, one can have multiple int
s per node, though the storage and computational costs grow linearly in the number of properties.2^32
distinct properties, then each node can have a sorted array of its property IDs; then an algorithm can iterate through each node's properties, testing for common members a la the merge sort. This method is not sensitive to the total number of distinct properties but is sensitive to the average number of properties a node has.There are probably others.
For a mapping, Method 2 can be used, with the array of property IDs serving as keys and a parallel array for values. (One can also just store the values in a vector with one element per property for each node analogously to Method 1, but then one loses the benefits of compactness of the one-bit-per-property representation and the speed of bitwise operations.)
Term description
This stems from this question on Stack Overflow: generalising it, suppose that each node
i
has some setA[i]
of properties (I am avoiding "attributes", since we use that term elsewhere.). We wish to specify a dyadic predictor that, in pseudocode, can be representedx[i,j] = length(intersect(A[i], A[j]))
(the number of propertiesi
andj
have in common) orx[i,j] = length(intersect(A[i], A[j])) > 0
(whetheri
andj
have any properties in common).Some examples:
A[i]
is the set of languagesi
speaks, and we wish to use an indicator of whetheri
andj
speak at least one common language as a predictor of their interaction. (This is from Stack Overflow.)A[i]
is a list ofi
's hobbies, and we wish to use the number of hobbiesi
andj
have in common to predict acquaintance.A[i]
is a list of placesi
visited over the course of a day (e.g., from a contact diary), and we wish to use the number of common areas visited byi
andj
to predict whether they had a contact.This seems like something that can be useful in a variety of circumstances.
A further generalisation of this concept is to make
A[i]
a mapping that maps propertyk
to some value (e.g., proficiency in a language) so that, e.g.,x[i,j] = max[k](min(A[i][k], A[j][k]))
(or some other "interacting" and "combining" functions in place ofmin()
andmax[k]()
, respectively). In the language example, this predictor represents the proficiency of the less-proficient actor in the two actors' best common language (where "best common language" is the language in which the less-proficient actor has the highest proficiency).In all cases, this would be a dyad-independent term, so in principle representable with
edgecov()
.Questions