Closed lgarrison closed 5 years ago
What is the expected behaviour here? If there was only a scalar passed, then I can understand numpy-style dimension broadcasting. But if there is only one weights array passed, then we should raise a ValueError and request that the user pass another weights array of shape (num_weights, ND2)
with the values all set to 1.0
for PAIR_PRODUCT
(or whatever would be equivalent for their weight type for custom weighting schemes). The num_weights
for the second weights array should match that of the first array.
The expected behavior is that the Python wrapper automatically provides such an array. Indeed, this is the current documented behavior (but only works as intended when the two sets are of equal size).
I think it's sensible to raise a ValueError if the weighting scheme is not
pair_product
.
On Thu, May 16, 2019 at 6:17 PM Manodeep Sinha notifications@github.com wrote:
What is the expected behaviour here? If there was only a scalar passed, then I can understand numpy-style dimension broadcasting. But if there is only one weights array passed, then we should raise a ValueError and request that the user pass another weights array of shape (num_weights, ND2) with the values all set to 1.0 for PAIR_PRODUCT (or whatever would be equivalent for their weight type for custom weighting schemes). The num_weights for the second weights array should match that of the first array.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=ABLA7S6LKY4ENT7BMXAEHNTPVXMQTA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTGS5I#issuecomment-493250933, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLA7S4QJRWRZAR4BIJ4G3LPVXMQTANCNFSM4HNPNDWA .
I am not sure how to automatically provide the array for the second set of particles. Say, the weights1
is full of uniform random numbers - how do you generate an equivalent array for the second dataset?
Are you saying to generate an array of the correct shape only when weights1
contains only 1 element (i.e., weights1
is a scalar)?
I just meant generate an array filled with 1s.
On Thu, May 16, 2019 at 6:40 PM Manodeep Sinha notifications@github.com wrote:
I am not sure how to automatically provide the array for the second set of particles. Say, the weights1 is full of uniform random numbers - how do you generate an equivalent array for the second dataset? Are you saying to generate an array of the correct shape only when weights1 contains only 1 element (i.e., weights1 is a scalar)?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=ABLA7S6M5DYUOEDT6RP6AUDPVXPGRA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTHZUY#issuecomment-493255891, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLA7S4EALGZ55QB2WBYHWDPVXPGRANCNFSM4HNPNDWA .
Hi!
I also wonder: I am writing my own estimator to convert DD, DR, RR for non-periodic BC into cf, but I wonder about whether there are any normalization factors when you have weights (I thought there shouldn't but without them I don't seem to be getting the right answer) -- this also doesn't seem to be in convert_3d_counts_to_cf(), but I may be wrong.
Thank you for helping.
Best, Boryana
On Thu, May 16, 2019 at 6:36 PM Lehman Garrison notifications@github.com wrote:
The expected behavior is that the Python wrapper automatically provides such an array. Indeed, this is the current documented behavior (but only works as intended when the two sets are of equal size).
I think it's sensible to raise a ValueError if the weighting scheme is not
pair_product
.On Thu, May 16, 2019 at 6:17 PM Manodeep Sinha notifications@github.com wrote:
What is the expected behaviour here? If there was only a scalar passed, then I can understand numpy-style dimension broadcasting. But if there is only one weights array passed, then we should raise a ValueError and request that the user pass another weights array of shape (num_weights, ND2) with the values all set to 1.0 for PAIR_PRODUCT (or whatever would be equivalent for their weight type for custom weighting schemes). The num_weights for the second weights array should match that of the first array.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=ABLA7S6LKY4ENT7BMXAEHNTPVXMQTA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTGS5I#issuecomment-493250933 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABLA7S4QJRWRZAR4BIJ4G3LPVXMQTANCNFSM4HNPNDWA
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=AKJFGGBXUTRIO7GWOWRU2T3PVXOVZA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTHSYI#issuecomment-493255009, or mute the thread https://github.com/notifications/unsubscribe-auth/AKJFGGAPSBTVWUTGBEPVQNTPVXOVZANCNFSM4HNPNDWA .
@lgarrison Ahh okay. So a matrix of the correct shape, filled with 1's for the PAIR_PRODUCT
case. ValueError with the "Please supply weights" for all other weighting schemes...
@boryanah Yeah - we really don't have an utility to compute the weighted correlation function (or even check that weights have been used and warn the user). This has been open for a while now in issue #139
Thanks for the reference!
I am really sorry if this is very obvious, but which pair counts should be multiplied by the average in what is fed to the convert function in the case of different weights for R and D and pair_product type of weighting. I tried: cf = convert_3d_counts_to_cf(N_d, N_d, N_r, N_r,
DD*w_avg,
DR*w_avg,
DR*w_avg, RR)
where w_avg is the mean weight of the data (mean weight of rand is 1), but it didn't give me the right answer
On Thu, May 16, 2019 at 6:52 PM Manodeep Sinha notifications@github.com wrote:
@lgarrison https://github.com/lgarrison Ahh okay. So a matrix of the correct shape, filled with 1's for the PAIR_PRODUCT case. ValueError with the "Please supply weights" for all other weighting schemes...
@boryanah https://github.com/boryanah Yeah - we really don't have an utility to compute the weighted correlation function (or even check that weights have been used and warn the user). This has been open for a while now in issue #139 https://github.com/manodeep/Corrfunc/issues/139
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=AKJFGGBZZEUORBFGSCYK4P3PVXQTLA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTILJA#issuecomment-493258148, or mute the thread https://github.com/notifications/unsubscribe-auth/AKJFGGCRUBC2KNGJUPAYGJLPVXQTLANCNFSM4HNPNDWA .
@boryanah You have to first compute the average weights on the dataset1 (and dataset2 for cross-correlations) with avgweight := sum(weights)/N_particles
. In each radial bin, the total sum of weights is weightsum := weightavg * num_pairs_in_that_bin
(where the weightavg
is in itself a sum of the product of the two weights). Now, you have to figure out what would have been the weightsum
if the points were randomly distributed -- so multiply by the number of points in one catalog with the density, volume, and average weight of the other catalog.
@lgarrison One thing I realise is that our rpavg
is currently completely unweighted. We should probably also have an option to say return_weighted_rpavg
-- where the math will be rp += weight*r
within the kernels (instead of rp += r
) and then divided by weightavg * num_pairs_in_bin
at the end. (I have a vague memory of we discussing this previously, and deciding against weighted rpavg for backwards compatibility)
@boryanah And then use the different weightsum
for each DD, DR, and RR and combine with Landy-Szalay to get the weighted average. @lgarrison Will you please check that I have not omitted steps (or completely mis-stated) ? :)
That sounds right to me! It seems like we should be returning weightsum instead of weightavg; it was a poor choice on my part. I guess it couldn't hurt to add the latter to the results struct.
And good point about rpavg. One option would be to return both, although that would be extra work in the kernels.
On Thu, May 16, 2019 at 11:24 PM Manodeep Sinha notifications@github.com wrote:
@boryanah https://github.com/boryanah And then use the different weightsum for each DD, DR, and RR and combine with Landy-Szalay to get the weighted average. @lgarrison https://github.com/lgarrison Will you please check that I have not omitted steps (or completely mis-stated) ? :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=ABLA7S4MDRZCOAZ6RDZ2LYLPVYQPTA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTT4FA#issuecomment-493305364, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLA7SZDD7RFLW6UGUBDTIDPVYQPTANCNFSM4HNPNDWA .
Dear Manodeep,
Thanks a lot! This is what I understood (I think I got 2 wrong)
Say 1 is Data; 2 is Random. We are trying to compute the factor that should multiply RD before it gets into the LS estimator. This factor will be F
1) N1,2 = number of particles Np1,2 = number of pairs per bin (vector) w_avg1.2 = "weight average" as returned by the DD function when computing DD(XYZ1,XYZ2, weights1,weights2) (also vector) sum1,2 = w_avg1,2*Np1,2 (vector) sum of the products of weights in a given bin
2) We also have mean weights: w_mean1,2 = sum(weights1,2)/N1,2 To compute the sum of weights in the case of random for dataset 1 with weights for 2, we do sum_rand1 = N1 [ N2/Vol Vol w_mean2 ] = N1 N2 w_mean2 (I think that I misunderstood this, maybe it is w_mean1w_mean2N1N2 for same volume?)
3) Here you say multiply the sum of weights per bin, sum1, by what would have been the sum of weights for the catalog if random, which is sum_rand1, so F1 = sum_rand1*sum1
4) We substitute DR1 and DR2 in LS (or convert 3d to cf) with DR1F1 and DR2F2
This doesn't seem symmetric, so I guess this is DR1 and there is also DR2 which should go into the final estimator.
Best, Boryana
On Thu, May 16, 2019 at 11:24 PM Manodeep Sinha notifications@github.com wrote:
@boryanah https://github.com/boryanah And then use the different weightsum for each DD, DR, and RR and combine with Landy-Szalay to get the weighted average. @lgarrison https://github.com/lgarrison Will you please check that I have not omitted steps (or completely mis-stated) ? :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=AKJFGGCMG7XD7RIUGVMMVGTPVYQPVA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTT4FA#issuecomment-493305364, or mute the thread https://github.com/notifications/unsubscribe-auth/AKJFGGGJXAQUUJS7QK4IPR3PVYQPVANCNFSM4HNPNDWA .
I suspect that for 2) sum_avg1 = N1* [N2/Vol Vol_bin w_mean2] and it is thus a vector and Vol is the total volume while Vol_bin is only volume of the bin
and for 3) maybe we want F1 = sum_rand1/sum1
I am really sorry for bothering you.
On Fri, May 17, 2019 at 12:07 AM Hadzhiyska, Boryana < boryana.hadzhiyska@cfa.harvard.edu> wrote:
Dear Manodeep,
Thanks a lot! This is what I understood (I think I got 2 wrong)
Say 1 is Data; 2 is Random. We are trying to compute the factor that should multiply RD before it gets into the LS estimator. This factor will be F
1) N1,2 = number of particles Np1,2 = number of pairs per bin (vector) w_avg1.2 = "weight average" as returned by the DD function when computing DD(XYZ1,XYZ2, weights1,weights2) (also vector) sum1,2 = w_avg1,2*Np1,2 (vector) sum of the products of weights in a given bin
2) We also have mean weights: w_mean1,2 = sum(weights1,2)/N1,2 To compute the sum of weights in the case of random for dataset 1 with weights for 2, we do sum_rand1 = N1 [ N2/Vol Vol w_mean2 ] = N1 N2 w_mean2 (I think that I misunderstood this, maybe it is w_mean1w_mean2N1N2 for same volume?)
3) Here you say multiply the sum of weights per bin, sum1, by what would have been the sum of weights for the catalog if random, which is sum_rand1, so F1 = sum_rand1*sum1
4) We substitute DR1 and DR2 in LS (or convert 3d to cf) with DR1F1 and DR2F2
This doesn't seem symmetric, so I guess this is DR1 and there is also DR2 which should go into the final estimator.
Best, Boryana
On Thu, May 16, 2019 at 11:24 PM Manodeep Sinha notifications@github.com wrote:
@boryanah https://github.com/boryanah And then use the different weightsum for each DD, DR, and RR and combine with Landy-Szalay to get the weighted average. @lgarrison https://github.com/lgarrison Will you please check that I have not omitted steps (or completely mis-stated) ? :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=AKJFGGCMG7XD7RIUGVMMVGTPVYQPVA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTT4FA#issuecomment-493305364, or mute the thread https://github.com/notifications/unsubscribe-auth/AKJFGGGJXAQUUJS7QK4IPR3PVYQPVANCNFSM4HNPNDWA .
Hey @boryanah - no worries at all. Unfortunately, I have a deadline right now but if I have not responded by Monday, please ping me again!
And thank you for catching the bug in the first place!
Thanks a lot! Good luck!! I appreciate it!
On Fri, May 17, 2019 at 12:39 AM Manodeep Sinha notifications@github.com wrote:
Hey @boryanah https://github.com/boryanah - no worries at all. Unfortunately, I have a deadline right now but if I have not responded by Monday, please ping me again!
And thank you for catching the bug in the first place!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=AKJFGGHWFOITATYIGER3KCTPVYZILA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTWWLI#issuecomment-493316909, or mute the thread https://github.com/notifications/unsubscribe-auth/AKJFGGCLQKNHCCC44CNYPFDPVYZILANCNFSM4HNPNDWA .
@boryanah Thanks for your patience. Would it be possible to add in some lines of code to demonstrate what you are doing?
Hi,
I will try to write it up soon, but I have switched to something else for the moment!
Best, Boryana
On Mon, May 20, 2019 at 10:57 PM Manodeep Sinha notifications@github.com wrote:
@boryanah https://github.com/boryanah Thanks for your patience. Would it be possible to add in some lines of code to demonstrate what you are doing?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/180?email_source=notifications&email_token=AKJFGGBM2D3QMQ6QLKRI7F3PWNQKVA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV2TLWI#issuecomment-494220761, or mute the thread https://github.com/notifications/unsubscribe-auth/AKJFGGFCJU7JSTMNGEJHUU3PWNQKVANCNFSM4HNPNDWA .
General information
Issue description
Passing only one weights array when doing a cross-correlation of particle sets of different sizes causes a
RuntimeError
.The cause is that the Python wrappers create a uniform weights array for the second set of particles if only the first was provided, but it is created with size equal to the first particle set.
Minimal failing example
Thanks to @boryanah for finding this and providing the failing example!