rmjarvis / TreeCorr

Code for efficiently computing 2-point and 3-point correlation functions. For documentation, go to
http://rmjarvis.github.io/TreeCorr/
Other
97 stars 37 forks source link

Reduce memory requirement for covariance matrix production when npatch is very large. #137

Closed rmjarvis closed 2 years ago

rmjarvis commented 2 years ago

@joezuntz and @jpratmarti ran into memory problems when doing jackknife covariances when npatch is rather large (500). The problem turned out to be in making the list of pairs of patch indices to accumulate for each jackknife subset. This list of lists had size of order 56 Bytes * npatch^3. Which for them was multiple GBytes. Seems excessive.

The solution is to use generators for the inner lists, which are only instantiated once we actually need them. This keeps the memory requirement for this a minor fraction of the memory required already for keeping the results dict. (I.e. a few MBytes in their use case)

Note: in implementing this, I ran into what I consider a bug in the Python language, reported here involving arcane details about how Python doesn't do closures correctly in generator expressions, so nested generators don't work right. Supposedly, this is how Guido wants it, but I can't think of any possible reason this would be desired behavior. Fortunately, Joe had a workaround for me that makes things work correctly.

In short:

pairs = [ (j for j in range(10) if j!=i) for i in range(10) ]

doesn't do what you would (probably) expect.

But this does work:

f = lambda i: (j for j in range(10) if j!=i)
pairs = [ f(i) for i in range(10) ]