Closed nilgoyette closed 5 years ago
I think we should mimic the numpy behavior here and not assume the weights are pre-normalized. I think we should divide by the weight sum inside the weighted_mean
method.
- I don't really know where
weighted_mean
and friends should go. Issummary_statistics
ok?
I'd say it makes sense for them to go there, given that it's where mean
is.
- I had to move
return_err_if_empty
andreturn_err_unless_same_shape
because ther are useful elsewhere.
It makes perfect sense.
- There's a little code-copy in
weighted_mean_axis
, to avoid(2 conditions + 1 unwrap) x nb_lanes
. Maybe I could create an inner function calledinner_weighted_mean
or something, then call it in both functions?
I don't feel too strongly in either direction - if you feel like doing it, all the better :+1:
Questions:
- Why are the
summary_statistics
tests (and others) not in/tests/*
? I thought that the public API was supposed to be tested outside the crate. Is this not a "standard"?
I'd say that it's standard for integration tests to go outside the crate, yes. On the other side, it's sometimes nice to have tests next to the code they refer to - we haven't been very consistent across the crate. It would probably makes sense to migrate them to the tests
folder.
Overall, this looks good to me, thanks for working on it! I suggested one area of improvement around testing that would increase the robustness of our current check - let me know what you think about it @nilgoyette.
I think we should mimic the numpy behavior here and not assume the weights are pre-normalized. I think we should divide by the weight sum inside the
weighted_mean
method.
I am not sure - I am concerned about introducing rounding errors doing a normalization step that might or might not be necessary. I would lean on the side of being explicit and letting the user taking care of normalisation, if they need it.
I think we should mimic the numpy behavior here and not assume the weights are pre-normalized. I think we should divide by the weight sum inside the
weighted_mean
method.I am not sure - I am concerned about introducing rounding errors doing a normalization step that might or might not be necessary. I would lean on the side of being explicit and letting the user taking care of normalisation, if they need it.
I see your point and I tend to agree.
However, my concern is that the weighted_mean
function doesn't return an actual weighted mean, except then the weights are normalized. From the definition in the documentation, I would tend to
weighted_sum
and note in the documentation that if the weights are pre-normalized, the result is a weighted meanweighted_mean
that actually computes the weighted mean given any input - ie. divide by the sum of the weights.Thoughts?
I think we should mimic the numpy behavior here and not assume the weights are pre-normalized. I think we should divide by the weight sum inside the
weighted_mean
method.I am not sure - I am concerned about introducing rounding errors doing a normalization step that might or might not be necessary. I would lean on the side of being explicit and letting the user taking care of normalisation, if they need it.
I see your point and I tend to agree.
However, my concern is that the
weighted_mean
function doesn't return an actual weighted mean, except then the weights are normalized. From the definition in the documentation, I would tend to
- Rename this function to
weighted_sum
and note in the documentation that if the weights are pre-normalized, the result is a weighted mean- Write an additional function
weighted_mean
that actually computes the weighted mean given any input - ie. divide by the sum of the weights.Thoughts?
I think that makes perfect sense :+1:
I think it's better now. I had to remove or if the length of the axis is zero and division by zero panics for type A.
on weighted_mean_axis
because the MultiInputError::EmptyInput
error happens before any panic. Imo it's better that way.
Can you please check the two first assert_eq!
of weighted_sum_dimension_zero
? As I wrote, [0, N] arrays don't make sense to me so I'm not sure what's the right output.
Here's a first version of
weighted_mean
andweighted_mean_axis
.Disclaimers:
weighted_mean
and friends should go. Issummary_statistics
ok?return_err_if_empty
andreturn_err_unless_same_shape
because ther are useful elsewhere.weighted_mean_axis
, to avoid(2 conditions + 1 unwrap) x nb_lanes
. Maybe I could create an inner function calledinner_weighted_mean
or something, then call it in both functions?Questions:
summary_statistics
tests (and others) not in/tests/*
? I thought that the public API was supposed to be tested outside the crate. Is this not a "standard"?