Open adematti opened 2 years ago
Yes, I think you're right! Everything should be computed using the weights. Would you like to open a PR?
@adematti Do you need the weighted average separations for your Science case or is this more of a consistency/implementation choice question?
@lgarrison We had a long discussion about this when you first started implementing the weights within Corrfunc. We came to the conclusion that it was better to leave the average separations as is - since that would be the most likely use-case. And that if any user required a weighted separation, then they could "easily" add a custom weight function and keep track of the weighted mean separation.
@manodeep well, I'd think the weighted version would be the most intuitive convention for the separation average ; this is the definition we will use in DESI --- though we will most certainly make sure cosmological results do not depend on this choice. In case one'd not want the weighted-average version, it would be straightforward to recompute the pair counts without weights.
Anyway, it's just fine to have this change only in the branch we are currently using in DESI :)
@manodeep, thanks, I had forgotten our discussion! It's coming back to me now. However, since adding the weights over 5 years ago (!), I think I've changed my mind on the issue. It seems to me that assigning a particle a weight of 2 ought to behave as if that particle appeared twice in the input data, at least if the user has specified the pair_product
weighting scheme.
Or, the more flexible thing would be to add a new column to the output called weighted_ravg
, so that the user has access to both the unweighted and weighted version, just like they have access both npairs
and weightavg
. But that's more work than just making ravg
weighted.
My vote would be to make ravg
weighted in Corrfunc proper, but it's also not a tiny amount of work. So if you would like to keep Corrfunc as-is @manodeep or if @adematti you don't have the spare effort for a PR right now, then keeping the change in your fork is fine by me too.
Alright great - as always I prefer to add a user-option that allows either weighted or unweighted average separations. Would the plan be the same - provide the default PAIR_PRODUCT weighted average for separations? And anything else the user has to code those up?
Regardless, if we do change the behaviour, we should try to keep backwards compatibility by default. That way this is not a breaking change but rather a functionality update.
What do y'all think?
Yeah, we already have an output_ravg
field that we normally set to True
, but we could allow the user to set that to the string 'weighted'
instead. Then the average separation would just inherit whatever weighting scheme the user is using.
To me this seems like a good idea. I had weighted average separation implemented upon the current master version in https://github.com/adematti/Corrfunc/tree/separation-avg. Yet, this branch does not allow for the option that you suggest for output_ravg and some things would need to be modified if https://github.com/manodeep/Corrfunc/pull/258 is accepted.
Currently the separation average in each bin (thetaavg, ravg, savg, rpavg) is computed without taking weights into account. I think, for usual purposes, we would like to include weights (when provided) in this computation? Typically, we would expect the same average separation, whether a particle is weighted by 0 or is totally removed from the catalog. It would always be possible, for someone truly interested in the non-weighted separation average, to run the computation without weights. What do you think?
Best