Open gdfiler opened 2 years ago
Thanks for raising this issue, and excellent catch. We'll need to decide what the behaviour should be here. My preference would be to tolerate the NA, dropping the point from the x-axis (the real world is messy and does occasionally contain unavoidable holes with missing data)
Reproducing the case with no NAs:
Reproducing the case with an NA:
The point_type column is calculated in the code below, which looks at both the special_cause_flag, and relative_to_mean columns. These are both NA, so line 29 is executed. https://github.com/nhs-r-community/NHSRplotthedots/blob/19424d38b59184863f01a54280270542e97bc491/R/ptd_calculate_point_type.R#L25-L30
The values for special_cause_flag and relative_to_mean columns are set here, so this is likely where the code to tolerate NAs will need to be added: https://github.com/nhs-r-community/NHSRplotthedots/blob/19424d38b59184863f01a54280270542e97bc491/R/ptd_spc_standard.R#L91-L97
The next stage is probably for someone to write a failing test. It would be worth also checking for other bugs that might be created by the other NAs that appear in that row.
Just a note, that as a workaround, you can pre-filter the incoming dataframe to contain only dates with data. This gives a result with gaps in the x axis, which makes the missing data more obvious while not affecting the SPC logic.
data("ae_attendances")
stable_set <- ae_attendances %>%
filter(org_code == "RRK",
type ==1,
period < as.Date("2018-04-01"))
# remove some data
stable_set$breaches[stable_set$period==as.Date("2018-03-01")] <- NA
# removing more data in the middle of the plot, as an illustration
stable_set$breaches[stable_set$period==as.Date("2017-06-01")] <- NA
# filter to remove any dates with NAs
filtered_set <- stable_set %>% filter(!is.na(breaches))
ptd_spc(filtered_set, value_field = breaches, date_field = period, improvement_direction = "decrease")
I am leaning towards not implementing any changes to tolerate or work around NAs. Perhaps we should throw a warning to prompt to user to look more closely at their data? The user is in control of the data being passed in.
Open to thoughts from others...
If I recall correctly, the documentation clearly states to not include NAs. I had read the documentation but still accidently included them so I like the idea of a warning when NAs are present to prompt a check of the data.
When NAs are included there is a risk that the users without experience of SPC may interpret the false special cause flags as real. If an error message will mitigate that risk then all good.
I think the best solution here would be to have a check along the lines of stopifnot(all(!is.na(x_values))
[pseudocode]. The error message should suggest to use tidyr::drop_na(x_values)
explicitly.
I don't like the idea of doing implicit dropping of values... the logic would be a mess (if there are no NA's, do nothing. If there are, drop na's but give a warning). I think the cleanness of raising an error but telling what to do to fix the issue is best.
I've just had a use case where I have encountered a similar behaviour, but with a different cause. For context, I'm using the function to produce the special cause flag for all ward areas, which is then being exported for use within a heatmap on our BI tool. I have multiple cases where 0 values exist for the whole period (e.g, number of harm falls could be zero for the whole period). These should not be counted as a special cause for concern However, where the entire period is 0, the function cannot calculate the lower or upper control limits, producing an NA for "special_cause_flag", and a subsequent "special_cause_concern"
I have put in place a workaround for my workflow with a simple REPLACE
exceptions$point_type <- replace(exceptions$point_type, is.na(exceptions$special_cause_flag),values=c("common_cause"))
I'm not sure if this needs a fix as such since this is probably a niche usage, but potentially needs a warning where the control limits are NaN.
Hi All I'm new to github and NHSRplotthedots, fairly new to NHSR Community and not that experienced in R either so please be forgiving if this is not appropriate or has been dealt with elsewhere - I came across an apparent issue today which I thought I would share.
I've been playing around with the NHSRplotthedots package and accidently left an NA value in my dataset which caused some special cause flags that should have been common cause.
Here is an example of the issue:
`library(NHSRplotthedots) library(NHSRdatasets) library(dplyr) library(ggplot2) library(scales)
data("ae_attendances")
stable_set <- ae_attendances %>% filter(org_code == "RRK", type ==1, period < as.Date("2018-04-01"))
ptd_spc(stable_set, value_field = breaches, date_field = period, improvement_direction = "decrease")
Note last 6 data points show common cause variation`
`#Now set the last data point to NA and rerun SPC
stable_set$breaches[stable_set$period==as.Date("2018-03-01")] <- NA ptd_spc(stable_set, value_field = breaches, date_field = period, improvement_direction = "decrease")
Now last 5 points show special cause variation`
Thanks for all the great work you are doing and looking forward to future developments and more packages from the community! Gary