rmjarvis / TreeCorr

Code for efficiently computing 2-point and 3-point correlation functions. For documentation, go to
http://rmjarvis.github.io/TreeCorr/
Other
97 stars 37 forks source link

write_patch_results #168

Closed JonLoveday closed 4 months ago

JonLoveday commented 5 months ago

As far as I can see, including the write_patch_results argument in the write method writes additional HDUs with the DD pair counts corresponding to each patch. Is it possible to write out the correlation function and/or all of the pair counts specified in the write function? A use is case is say I want to fit a function to each patch estimate to obtain more robust covariances on the fit parameters.

I am currently doing: dd.process(gcat) dr.process(gcat, rcat) rr.process(rcat) dd.calculateXi(rr=rr, dr=dr) dd.write('filename.fits', rr=rr, dr=dr, write_patch_results=True) As a workaround, I assume I can write out the patch counts into separate files for DD, DR, and RR pairs, and then compute w(theta) for each patch, but this seems like an unnecessary faff.

rmjarvis commented 5 months ago

Sure. Both of those sound like reasonable feature additions. I think the rr, dr results dicts shouldn't be too hard to implement. As you say, for now the workaround is to write each one separately into their own files.

You can also separately write out the covariance matrix from the start, rather than relying on reading in the TreeCorr output file to reconstruct it. E.g. np.savez(cov_file_name, dd.cov).

Since the covariance matrix is 2d, I'll have to think what the best way to store that is in the TreeCorr output file. (FITS and HDF5 are probably easy, but ASCII will take a little thought.) But numpy's save function does a nice job there, so you can just use that for now.

rmjarvis commented 4 months ago

Re-reading your text above (as I'm starting to think about implementation), I realize that you talked about computing w(theta) for each patch, which is probably not anything you ever want to do. That's not the best way to calculate a covariance matrix, since you miss the cross-patch pairs.

The unit test that recovers a covariance matrix from the three (DD, DR, RR) output files is here, so you might want to use that as a model for how to do it in your code. Basically:

# In main code that computes dd, dr, rr
dd.write(dd_file_name, dr=dr, rr=rr, write_patch_results=True)
rr.write(rr_file_name, write_patch_results=True)
dr.write(dr_file_name, write_patch_results=True)

# In other code that needs covariance matrix
dd = treecorr.NNCorrelation(...)
rr = treecorr.NNCorrelation(...)
dr = treecorr.NNCorrelation(...)
dd.read(dd_file_name)
rr.read(rr_file_name)
dr.read(dr_file_name)
dd.calculateXi(rr=rr, dr=dr)
cov = dd.estimate_cov('jackknife')

It's still more boilerplate than we really need of course, but maybe simpler than what you seemed to be describing that you currently do. I'm working on making the workflow look like this:

# In main code that computes dd, dr, rr
dd.write(file_name, dr=dr, rr=rr, write_patch_results=True)

# In other code that needs covariance matrix
dd = treecorr.NNCorrelation(...)
dd.read(file_name)
cov = dd.estimate_cov('jackknife')

Or even better, I think I can probably make the read just be:

dd = treecorr.NNCorrelation.from_file(file_name)

which would be even nicer API I think.

rmjarvis commented 4 months ago

Done on #172. Will be included in release version 5.0.