markmfredrickson / optmatchExperimental

BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Measure no of copies/joint memory size of distances #9

Open benthestatistician opened 5 years ago

benthestatistician commented 5 years ago

How many copies of distance information, explicit or implicit, get made in the course of a call to fullmatch() or pairmatch()?

(Mostly I'm interested in copies made in R, but I'd also be interested to know whether these objects are being copied within the solver code.)

josherrickson commented 5 years ago
> library(optmatch)
> data(nuclearplants)
> m <- match_on(pr ~ cost, data = nuclearplants)
> 
> tracemem(m)
[1] "<0x13111c4d8>"
> 
> fullmatch(m, data = nuclearplants)
tracemem[0x13111c4d8 -> 0x130e70178]: as.vector prepareMatching prepareMatching SubDivStrat .fullmatch <Anonymous> mapply fullmatch.matrix fullmatch
tracemem[0x13111c4d8 -> 0x130fb5ee0]: dist_digest fullmatch.matrix fullmatch
tracemem[0x130fb5ee0 -> 0x130fbe228]: dist_digest fullmatch.matrix fullmatch 

The copying inside dist_digest are due to the removal of the call attribute - necessary because two otherwise identical distances might have different calls (e.g. different fullmatch restrictions) and hence different hashes. It looks like we're not using the hashed distance for anything other than the utility function optmatch_same_distance (which we aren't using internally). Perhaps it's time to revisit an argument save.dist = c("none", "hashed", "full")?

benthestatistician commented 5 years ago

Oh no, don't do anything to our dist_digest baby! :anguished:

Seriously, I think the copying inside dist_digest() is less problematic than copying lower in the call stack [the one w/ fullmatch.matrix on top and /src/relax4s.f at bottom].)

josherrickson commented 5 years ago

I think there is a bug inside dist_digest - it looks like I restored the call attribute unnecessarily resulting in a second copy. I'll fix that when I get a few minutes.

josherrickson commented 5 years ago

I fixed that bug. optmatch:7f3b265 We're down to two copies now.

benthestatistician commented 5 years ago

I formed the hypothesis that there would be another round of copying when there are subproblems, so I decided to check it. Oddly, the results would suggest the opposite may be true:

library(optmatch)
data(nuclearplants)
m <- match_on(pr ~ cap + strata(pt), caliper=100, data=nuclearplants)
is(m, "BlockedInfinitySparseMatrix")
## [1] TRUE
tracemem(m)
fullmatch(m, data=nuclearplants)
tracemem[0x7fe43e738038 -> 0x7fe440ccc638]: dist_digest fullmatch.BlockedInfinitySparseMatrix fullmatch 
tracemem[0x7fe440ccc638 -> 0x7fe440cdcae0]: dist_digest fullmatch.BlockedInfinitySparseMatrix fullmatch 
  H   I   A   J   B   K   L   M   C   N   O   P   Q   R   S   T   U   D   V   E 
0.7 0.2 0.1 0.1 0.2 0.4 0.3 0.4 0.3 0.7 0.4 0.7 0.4 0.5 0.6 0.7 0.6 0.4 0.2 0.5 
  W   F   X   G   Y   Z   d   e   f   a   b   c 
0.6 0.6 0.7 0.7 0.4 0.2 1.2 1.1 1.3 1.1 1.2 1.3 

whereas

n <- match_on(pr ~ cap, caliper=100, data=nuclearplants)
is(n, "InfinitySparseMatrix")
tracemem(n)
## [1] TRUE
fullmatch(n, data=nuclearplants)
tracemem[0x7fe443c4ddd8 -> 0x7fe443d21f08]: getDataPart data.frame prepareMatching prepareMatching SubDivStrat .fullmatch <Anonymous> mapply fullmatch.InfinitySparseMatrix fullmatch 
tracemem[0x7fe443c4ddd8 -> 0x7fe4426e9498]: dist_digest fullmatch.InfinitySparseMatrix fullmatch 
tracemem[0x7fe4426e9498 -> 0x7fe4426f1ee0]: dist_digest fullmatch.InfinitySparseMatrix fullmatch 
   H    I    A    J    B    K    L    M    C    N    O    P    Q    R    S    T 
 1.3  1.2  1.2  1.4  1.4  1.7  1.6  1.7  1.6 1.10  1.7 1.10  1.7  1.8  1.9  1.3 
   U    D    V    E    W    F    X    G    Y    Z    d    e    f    a    b    c 
 1.9  1.7  1.4  1.8  1.5  1.9 1.10 1.10  1.7  1.4  1.3 1.10  1.1  1.1  1.3  1.5 

anybody get it? (Done off of released version ‘0.9.10’, not a development version.)

benthestatistician commented 5 years ago

...does this passage of the tracemem() help page mean that the objects are not being copied in the Fortran routines?

When an object is traced any copying of the object by the C function ‘duplicate’ produces a message to standard output, as does type coercion and copying when passing arguments to ‘.C’ or ‘.Fortran’.

benthestatistician commented 5 years ago

(tracemem is telling us about implicit copies, but there's also explicit copying, e.g. possibly at this piece of fullmatch.matrix():

  # problems is guaranteed to be a list of DistanceSpecifictions
  # it may only have 1 entry
  problems <- findSubproblems(x)

Unclear whether the really creates a new copy of the contents of x when x a simple ISM, but it probably has to when x is a BISM with multiple groups. To be checked on, along w/ other similar instances.)