Closed vpnagraj closed 9 months ago
Hi -- thanks for filing this issue! I agree that we should account for numerical/floating point precision issues in this check. Please expect an update early next week.
Thank you so much for your detailed bug report @vpnagraj ! This was indeed an oversight on our part and has been fixed in PR #53 (packade version v0.0.0.9003).
I've included an additional test for the probabilities that were causing errors for you. If you could try validating your submission gain using the latest package version and confirm it is fixed for you that would be greatly appreciated!
Interestingly, the behaviour you described is somewhat system specific as the original approach also worked for me on my machine with your example. But this is a such a well known issue that the more robust approach across systems is of course required.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tst <-
tribble(~value,
"0.818",
"0.180",
"0.002") %>%
mutate(value = as.numeric(value))
sum(tst$value) == 1L
#> [1] TRUE
all.equal(sum(tst$value), 1L)
#> [1] TRUE
Created on 2023-10-09 with reprex v2.0.2
Thank you again for reporting! As we are still in testing phase of the framework, we would greatly appreciate any and all bug reports as well as any suggestions on how functionality can be improved to meet team's needs.
Thank you all for such a quick turn around on this fix!
Confirming that when I test the validation with v0.0.0.9003 I am now seeing a successful check (i.e., no issues with any of the pmf forecasts summing to 1 due to precision).
My team and I are currently preparing forecasts for the 2023-24 FluSight season. Ahead of the first submission, we have tested the
hubValidations::validate_submissions()
function. Most of the validations behave as expected. However, we have encountered some unexpected behavior with the "sum to 1" check for the "pmf" output type.Our test submission file includes two cases where the values appear to sum to 1, but because of handling of floating point numbers (https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) the logic in
check_tbl_value_col_sum1()
(https://github.com/Infectious-Disease-Modeling-Hubs/hubValidations/blob/main/R/check_tbl_value_col_sum1.R#L52) flags these as not equal.I've included a reprex below.
We have tried rounding down to 3 digits (as shown in the reprex) and still see the flag raised for some locations/horizons. Short of rounding down to 2 digits, I'm not sure if there is anything we can do on our end to resolve this problem. Would you all consider updating the logic in
check_tbl_value_col_sum1()
to useall.equal()
ordplyr::near()
or another approach? If not do you have any suggestions for us to try? I would expect that if we are having this issue, other submitters might run into it as well.Created on 2023-10-06 with reprex v2.0.2