manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

separation average #271

Open adematti opened 2 years ago

adematti commented 2 years ago

Currently the separation average in each bin (thetaavg, ravg, savg, rpavg) is computed without taking weights into account. I think, for usual purposes, we would like to include weights (when provided) in this computation? Typically, we would expect the same average separation, whether a particle is weighted by 0 or is totally removed from the catalog. It would always be possible, for someone truly interested in the non-weighted separation average, to run the computation without weights. What do you think?

Best

lgarrison commented 2 years ago

Yes, I think you're right! Everything should be computed using the weights. Would you like to open a PR?

manodeep commented 2 years ago

@adematti Do you need the weighted average separations for your Science case or is this more of a consistency/implementation choice question?

@lgarrison We had a long discussion about this when you first started implementing the weights within Corrfunc. We came to the conclusion that it was better to leave the average separations as is - since that would be the most likely use-case. And that if any user required a weighted separation, then they could "easily" add a custom weight function and keep track of the weighted mean separation.

adematti commented 2 years ago

@manodeep well, I'd think the weighted version would be the most intuitive convention for the separation average ; this is the definition we will use in DESI --- though we will most certainly make sure cosmological results do not depend on this choice. In case one'd not want the weighted-average version, it would be straightforward to recompute the pair counts without weights.

Anyway, it's just fine to have this change only in the branch we are currently using in DESI :)

lgarrison commented 2 years ago

@manodeep, thanks, I had forgotten our discussion! It's coming back to me now. However, since adding the weights over 5 years ago (!), I think I've changed my mind on the issue. It seems to me that assigning a particle a weight of 2 ought to behave as if that particle appeared twice in the input data, at least if the user has specified the pair_product weighting scheme.

Or, the more flexible thing would be to add a new column to the output called weighted_ravg, so that the user has access to both the unweighted and weighted version, just like they have access both npairs and weightavg. But that's more work than just making ravg weighted.

My vote would be to make ravg weighted in Corrfunc proper, but it's also not a tiny amount of work. So if you would like to keep Corrfunc as-is @manodeep or if @adematti you don't have the spare effort for a PR right now, then keeping the change in your fork is fine by me too.

manodeep commented 2 years ago

Alright great - as always I prefer to add a user-option that allows either weighted or unweighted average separations. Would the plan be the same - provide the default PAIR_PRODUCT weighted average for separations? And anything else the user has to code those up?

Regardless, if we do change the behaviour, we should try to keep backwards compatibility by default. That way this is not a breaking change but rather a functionality update.

What do y'all think?

lgarrison commented 2 years ago

Yeah, we already have an output_ravg field that we normally set to True, but we could allow the user to set that to the string 'weighted' instead. Then the average separation would just inherit whatever weighting scheme the user is using.

adematti commented 2 years ago

To me this seems like a good idea. I had weighted average separation implemented upon the current master version in https://github.com/adematti/Corrfunc/tree/separation-avg. Yet, this branch does not allow for the option that you suggest for output_ravg and some things would need to be modified if https://github.com/manodeep/Corrfunc/pull/258 is accepted.