Open benthestatistician opened 5 years ago
> library(optmatch)
> data(nuclearplants)
> m <- match_on(pr ~ cost, data = nuclearplants)
>
> tracemem(m)
[1] "<0x13111c4d8>"
>
> fullmatch(m, data = nuclearplants)
tracemem[0x13111c4d8 -> 0x130e70178]: as.vector prepareMatching prepareMatching SubDivStrat .fullmatch <Anonymous> mapply fullmatch.matrix fullmatch
tracemem[0x13111c4d8 -> 0x130fb5ee0]: dist_digest fullmatch.matrix fullmatch
tracemem[0x130fb5ee0 -> 0x130fbe228]: dist_digest fullmatch.matrix fullmatch
The copying inside dist_digest
are due to the removal of the call
attribute - necessary because two otherwise identical distances might have different calls (e.g. different fullmatch restrictions) and hence different hashes. It looks like we're not using the hashed distance for anything other than the utility function optmatch_same_distance
(which we aren't using internally). Perhaps it's time to revisit an argument save.dist = c("none", "hashed", "full")
?
Oh no, don't do anything to our dist_digest
baby! :anguished:
Seriously, I think the copying inside dist_digest()
is less problematic
than copying lower in the call stack [the one w/ fullmatch.matrix on top
and /src/relax4s.f at bottom].)
I think there is a bug inside dist_digest
- it looks like I restored the call attribute unnecessarily resulting in a second copy. I'll fix that when I get a few minutes.
I fixed that bug. optmatch:7f3b265 We're down to two copies now.
I formed the hypothesis that there would be another round of copying when there are subproblems, so I decided to check it. Oddly, the results would suggest the opposite may be true:
library(optmatch)
data(nuclearplants)
m <- match_on(pr ~ cap + strata(pt), caliper=100, data=nuclearplants)
is(m, "BlockedInfinitySparseMatrix")
## [1] TRUE
tracemem(m)
fullmatch(m, data=nuclearplants)
tracemem[0x7fe43e738038 -> 0x7fe440ccc638]: dist_digest fullmatch.BlockedInfinitySparseMatrix fullmatch
tracemem[0x7fe440ccc638 -> 0x7fe440cdcae0]: dist_digest fullmatch.BlockedInfinitySparseMatrix fullmatch
H I A J B K L M C N O P Q R S T U D V E
0.7 0.2 0.1 0.1 0.2 0.4 0.3 0.4 0.3 0.7 0.4 0.7 0.4 0.5 0.6 0.7 0.6 0.4 0.2 0.5
W F X G Y Z d e f a b c
0.6 0.6 0.7 0.7 0.4 0.2 1.2 1.1 1.3 1.1 1.2 1.3
whereas
n <- match_on(pr ~ cap, caliper=100, data=nuclearplants)
is(n, "InfinitySparseMatrix")
tracemem(n)
## [1] TRUE
fullmatch(n, data=nuclearplants)
tracemem[0x7fe443c4ddd8 -> 0x7fe443d21f08]: getDataPart data.frame prepareMatching prepareMatching SubDivStrat .fullmatch <Anonymous> mapply fullmatch.InfinitySparseMatrix fullmatch
tracemem[0x7fe443c4ddd8 -> 0x7fe4426e9498]: dist_digest fullmatch.InfinitySparseMatrix fullmatch
tracemem[0x7fe4426e9498 -> 0x7fe4426f1ee0]: dist_digest fullmatch.InfinitySparseMatrix fullmatch
H I A J B K L M C N O P Q R S T
1.3 1.2 1.2 1.4 1.4 1.7 1.6 1.7 1.6 1.10 1.7 1.10 1.7 1.8 1.9 1.3
U D V E W F X G Y Z d e f a b c
1.9 1.7 1.4 1.8 1.5 1.9 1.10 1.10 1.7 1.4 1.3 1.10 1.1 1.1 1.3 1.5
anybody get it? (Done off of released version ‘0.9.10’, not a development version.)
...does this passage of the tracemem()
help page mean that the objects are not being copied in the Fortran routines?
When an object is traced any copying of the object by the C function ‘duplicate’ produces a message to standard output, as does type coercion and copying when passing arguments to ‘.C’ or ‘.Fortran’.
(tracemem is telling us about implicit copies, but there's also explicit copying, e.g. possibly at this piece of fullmatch.matrix()
:
# problems is guaranteed to be a list of DistanceSpecifictions
# it may only have 1 entry
problems <- findSubproblems(x)
Unclear whether the really creates a new copy of the contents of x when x a simple ISM, but it probably has to when x is a BISM with multiple groups. To be checked on, along w/ other similar instances.)
How many copies of distance information, explicit or implicit, get made in the course of a call to fullmatch() or pairmatch()?
(Mostly I'm interested in copies made in R, but I'd also be interested to know whether these objects are being copied within the solver code.)