Signed-off-by: anchit-chandran anchit97123@gmail.com

Overview

A few things were going on here, and was a fun rabbit hole!

TL/DR : main things contributing to the random failing were:

Floating point precision
SQL median is weird
Accidentally using random values for Patient.diagnosis_date

All have been fixed. Run the test many times as it should now be fine (importantly, no random values are being used now, so one pass = deterministic!).

Fun details on above:

1. Floating point precision

We have our Visit.hba1c field defined as a DecimalField. The actual kpi calculation method querying this outputs a python Decimal( ) object. My test was just using float( ) objects. Good article on difference: https://www.laac.dev/blog/float-vs-decimal-python/ but generally:

floats are good for speed & convenience, use the classic IEEE 754 binary representation, can lead to rounding precision errors
Decimal, somehow uses base-10 representation`, but means it can accurately represent decimal numbers without the traditional floating point errors

This was partly leading to errors as tests are comparing different types of values. Fixed by converting expected values into Decimal. The output of the calculate_kpi_45_median_hba1c still returns a float though for convenience

2. SQL median is weird

There is no standard implementation of median in Django's ORM because it's more complicated to calculate. I created a custom SQL class for it but that ended up being harder to reason with. Instead, did a simpler approach of just using ORM to retrieve a list of hba1c values, and then calculate the median in Python. Much more straightforward.

3. Accidentally using random values for `Patient.diagnosis_date`

Rookie error! In the tests, just forgot to set PatientFactory.diagnosis_date. Expected behavior of the PatientFactory is to generate random valid values for any not given (to make it easier to create Patient objects).

Fixed by just defining.

Code changes

tests updated to use Decimal() values for hba1c instead of float
tests updated to specify diagnosis_date, thus not relying on the PatientFactory() generating a random diagnosis_date
tests updated to be a little more clear for the invalid case
calculate_median implementation moved into Python instead of SQL, simplifying and circumventing harder-to-reason precision errors
calculate_kpi_45_median_hba1c method more straightforward because of using a Python way of calculating median
also update type annotations of returns from calculate_kpi_ methods to KPIResult

Mentions

@mentions of the person or team responsible for reviewing proposed changes.

rcpch / national-paediatric-diabetes-audit

[KPIS] Fix test for KPI45 sporadically failing #352

Overview

Fun details on above:

1. Floating point precision

2. SQL median is weird

3. Accidentally using random values for `Patient.diagnosis_date`

Code changes

Related Issues

Mentions

rcpch / national-paediatric-diabetes-audit

[KPIS] Fix test for KPI45 sporadically failing #352

Overview

Fun details on above:

1. Floating point precision

2. SQL median is weird

3. Accidentally using random values for Patient.diagnosis_date

Code changes

Related Issues

Mentions

3. Accidentally using random values for `Patient.diagnosis_date`