Closed joezuntz closed 2 years ago
@rmjarvis This is passing all tests at last!
Thanks! I'll have a go at this.
Hi Mike,
I had a quick go at the pair iterators, and have done two of them (see code below) but haven't quite got capacity to do the other two since they seems a bit more complex. So I'd like to take you up on your offer to look at this later!
I've addressed your other notes I hope.
import collections
import numpy as np
from treecorr.util import lazy_property
class PairIterator(collections.abc.Iterator):
def __init__(self, results, npatch1, npatch2, index):
self.results = results
self.npatch1 = npatch1
self.npatch2 = npatch2
self.index = index
def __iter__(self):
self.gen = iter(self.make_gen())
return self
def __next__(self):
return next(self.gen)
@classmethod
def make_pair_lists(cls, corrs):
return [[cls(c.results, c.npatch1, c.npatch2, i) for i in range(c.npatch1)] for c in corrs]
class JackknifePairIterator(PairIterator):
def make_gen(self):
if self.npatch2 == 1:
gen = ((j,k) for j,k in self.results.keys() if j!=self.index)
elif self.npatch1 == 1:
gen = ((j,k) for j,k in self.results.keys() if k!=self.index)
else:
# For each i:
# Select all pairs where neither is i.
assert self.npatch1 == self.npatch2
gen = ((j,k) for j,k in self.results.keys() if j!=self.index and k!=self.index)
return gen
class SamplePairIterator(PairIterator):
def make_gen(self):
if self.npatch2 == 1:
# k = 0 here
gen = ((j,k) for j,k in self.results.keys() if j==self.index)
elif self.npatch1 == 1:
# j = 0 here
gen = ((j,k) for j,k in self.results.keys() if k==self.index)
else:
assert self.npatch1 == self.npatch2
# Note: It's not obvious to me a priori which of these should be the right choice.
# Empirically, they both underestimate the variance, but the second one
# does so less on the tests I have in test_patch.py. So that's the one I'm
# using.
# For each i:
# Select all pairs where either is i.
#vpairs = [ [(j,k) for j,k in self.results.keys() if j==i or k==i]
# for i in range(self.npatch1) ]
# For each i:
# Select all pairs where first is i.
gen = ((j,k) for j,k in self.results.keys() if j==self.index)
return gen
Thanks Joe!
For the CosmoDC2 analysis estimate_multi_cov was taking two hours, being single threaded.
This adds two options for parallel calculations, SMP or MPI. The user can pass in either a number of processes or a communicator. The difference processes each generate different rows of the design matrix, and the total is then summed together.
This involves moving the pair generation inside the design matrix calculation, because the generators can't be pickled.
In the associated tests the tolerances for the bootstrap calculation have to be low because there is a random component to the test. There doesn't seem to be a way to reset the RNG to produce the same indices for different processes, at least in the SMP case.