Write patch results for RR, DR, etc. when write_patch_results=True

rmjarvis commented 4 months ago

@JonLoveday suggested in #168 that when using write_patch_results=True for an NNCorrelation object that has run calculateXi with rr and possibly dr randoms, that TreeCorr write out all the patch results for them as well, so that proper patch-based covariances can be built when reading the file back in. This PR implements that and the corresponding thing for NNNCorrelations.

While I was at it, I implemented a couple of other I/O related things I'd wanted to do. The first is to add classmethods called from_file for all the Correlation objects. These let you directly make the Correlation object without having to know the relevant configuration of the object that wrote the file. To enable this, I added a few more parameters in the header, so it knows how to build the object when reading.

Additionally, I added a write_cov option to the write functions, which write the covariance matrix to the output file. This is also read back in when reading it, so you don't have to remake the covariance if the original object had already calculated it.

JonLoveday commented 3 months ago

Thanks for making this update, but I must say I am confused about what is now output. I have a simple test case of an NN auto-correlation function, utilising DD, DR, and RR counts, with 10 jackknife patches (same 10 patches for the data and random catalogues). I would expect to see an output file with the overall results in the the first HDU and the pair counts excluding each jackknife region in turn in 10 subsequent HDUs. Instead, I see extensions 2-68 with names main_pp_0 - main_pp_17, _rr, _rr_pp_0 - _rr_pp_18, _dr, _dr_pp_0 - _dr_pp_27. I assume these are the DD, RR, and DR counts respectively, but why are there different numbers of each? Also, looking at the pair counts in each extension, these seem to be the pair counts from each patch, rather than excluding each patch (they are much smaller than the overall counts).

Here is the output file: z0.fits.zip

rmjarvis commented 3 months ago

Yes, they are the counts from each pair of patches. That's the underlying data from which all kinds of covariance matrices are calculated (jackknife, bootstrap, sample, etc.). The point is that you can now just let TreeCorr read in this file and compute the jackknife covariance.

corr = treecorr.Corr2.from_file(output_file_name)
cov = corr.estimate_cov('jackknife')

Or you can compute covariances of derived quantities etc. Everything that you could have done from the original object that wrote the file. Are you trying to do something with this that is not enabled by this interface?

JonLoveday commented 3 months ago

Thanks for the quick response, it's making more sense now. I've not come across pairwise use of jackknife patches before (traditionally, one computes the pair counts excluding each patch in turn). My patches are disjoint (see attached figure, galaxies on the left, randoms on the right, colour-coded by patch number. Maybe that explains the different numbers of dd, dr, rr, extensions?

What I am trying to do is to use clustering-based redshift inference to constrain the N(z) distribution of a photometric data set cross-correlated with a smaller number of galaxies with redshifts, see e.g. https://academic.oup.com/mnras/article/522/3/3693/7143786. N(z) depends on the ratio of an angular cross-corrrelation function over the square root of an auto-correlation function. In order to get reliable uncertainties on N(z) I would like to be able to calculate it separately excluding each jackknife region in turn. Does that sound feasible using treecorr? The alternative is to propagate the uncertainties on the two correlation functions, but I think that will be less robust given likely non-Gaussian errors. Figure_1

rmjarvis commented 3 months ago

Sure. You can use build_cov_design_matrix to compute the correlation function for each jackknife selection. cf. https://rmjarvis.github.io/TreeCorr/_build/html/correlation2.html#treecorr.Corr2.build_cov_design_matrix That's the design matrix from which the usual jackknife covariance matrix is computed.

JonLoveday commented 3 months ago

Brilliant, thanks Mike!

rmjarvis / TreeCorr

Write patch results for RR, DR, etc. when write_patch_results=True #172