Closed ThomUK closed 3 years ago
This was put in more as a placeholder to figure out what the behaviour should be. The issue is in your example you would then have 10 points that are being used to calculate the limit, which is a) less than the fix after n points specified, and b) less than the recommended number of points to base spc calculations off of. Though, while we have a check to make sure fix after n is sufficiently large (min 12 iirc?) we don't do any checks to make sure rebase groups are sensible. I would say we should probably instead check and raise a warning that group sizes are too small and may not be interpretable
Yes, i nearly included that in my issue, but was getting late!
a) is no problem. fix_after_n_points is there to prevent excess points pushing limits wider and the tool losing sensitivity, which wouldn't happen here.
b) is a valid concern / risk
Although many would say even 12 points is too few for SPC (i agree more is ideal), it's not unusual to have a reason to rebase fewer than 12 points from the end of the series. If you know the reason for the process change, and understand the new process normal, it would be inconvenient to have to wait 12mths before rebasing became possible.
So, I agree a warning would be a good way to handle this. Along the lines of "After rebasing, one or more of the process series are fewer than x points long, which may make process limits less reliable. If this is deliberate, carry on. If not, consider waiting for more data before deciding to rebase". Or something.
Could we have an option to silence the warning for users using in rmarkdown loops? Silencing warnings for a whole chunk makes me nervous.
What do you think?
There already exists functionality for silencing warnings, either with suppressWarnings() or with rmarkdown’s chunk options (which can be set globally with Knitr options).
the discussion we had earlier (@Lextuga007 , @SimonW-M , @chrisreading01 ) was to perhaps even go a step further and change the plot by adding a warning to the plot showing when too few points are included in any group of points
Yes, just to add my particular view on this, warnings are great at the coding level but can be overlooked, suppressed or ignored and are also only available to the person coding. My feeling is we should complement this with a visual marker and a caption (as was suggested in our meeting) on the chart itself for those using the charts. We didn't want to necessarily stop such a chart being created but provide some sort caveat that this has been produced in the knowledge that this is not 'best practice'. In the first instance we thought about a caption and for later releases an exclamation mark logo or something similar. Just note that any use of colours like red should be accessible and I'd refer to https://gss.civilservice.gov.uk/policy-store/releasing-statistics-in-spreadsheets/ for hex details.
I think that makes good sense, and helps with transparency of process for consumers of the charts.
It looks like we have an agreed direction - I'll investigate how it might be implemented in code, unless anyone else fancies a look...
All yours! I've assigned you. I've marked for 0.2.0 as I think that this isn't a deal breaker for 0.1.0, so gives us some time for slippage if we want to get this out sooner rather than later.
Checking the warning on <12 points and the recommendation here https://www.england.nhs.uk/improvement-hub/wp-content/uploads/sites/44/2017/11/A-guide-to-creating-and-interpreting-run-and-control-charts.pdf is for < 10 points to use Run or X charts. Does this package include either of those charts and should the code be <10 or keep at <12?
ptd_spc_options() currently includes the section below, which prevents rebasing and fixing a baseline at the same time.
Ideally it would be possible to use both these arguments in the same plot.
The rebase argument would have priority, and would section the plot up into rebase groups. SPC calcs would then apply to each group, and if any group had more than fix_after_n_points data points, the SPC calcs would ignore the excess points.
Describing a hypothetical plot with 40 data points, a rebase at the middle date, and fix_after_n = 12, we would get: 12 points used to calculate initial limits, 8 more points plotted over the original limit lines, but not used for any limit calcs, 12 more points, which are used to calculate the rebased limits and mean, 8 final points plotted over the new limit lines, but again not used for any limit calculations
A hypothetical plot with 40 data points, a rebase after 30 points, and fix_after_n = 12, would result in: 12 points used to calculate initial limits, 28 more points plotted over the original limit lines, but not used for any limit calcs, 10 final points, all of which are used to calculate the rebased limits.