Closed pacharanero closed 3 months ago
@pacharanero Im working on the tests for trisomy_21 and turner
@a-wei-0513
chart-data-persistence
, as they got lumped in with some other improvements.@dc2007git and I spent a little time looking at this issue today.
The main issue looks to be that 'test files' look to have gotten out of date with the actual (correct) responses of the API. Over time it is going to be quite difficult to keep the JSON comparison identical which means these tests are always going to be quite brittle.
However, these tests do have the benefit of enforcing a very tight 'contract' between us and the end users in that any change from the expected responses is immediately flagged to us.
Options:
Abandon the idea of comparing the JSON response with a 'test file' JSON version. Instead we could focus on tests which check more atomic and simple parts of the response:
a) eg. Check all the endpoints with valid data for 200 OK, and check all the endpoints with invalid data for 422 Unprocessable Entity or appropriate error.
b) eg. Check the JSON response for the major nodes of the response, but not down to the level of the leaf nodes or the entire JSON - for example you could check that a MeasurementObject was returned containing a "measurement_calculated_values"
node. These would be good enough to detect if we had inadvertently seriously broken the response, but wouldn't need to be constantly updated as with JSON comparison.
Continue with full JSON comparisons. This would probably require us to make a set of scripts which made known-valid requests and copied the JSON responses into a file for the tests to use. However doing this does somewhat undermine the power of the tests because we could make some accidental breaking change in the API, and then bake that into our test JSON, then run the tests (which almost by definition will all pass)
I personally favour 1. b. but would be interested in views from the team.
Context Adding some more context to the overall problem for the continuous failing tests, after @pacharanero and I spoke earlier today.
Upon running the test suite on every new push, it seems as though a ValidationError is being thrown when setting response
equal to client.post("<some-endpoint>", json=body)
. Removing this line allows for the (now-commented out) direct comparison between 2 json files to occur. The 2 files are the response
that has just been generated by the TestClient, and the 'ground truth' which contains a saved response from the live API. Obviously, both of these jsons have been generated with the exact same request parameters. Therefore, the outputs should be the EXACT same - there is no reason for any differences to occur between the 2 files. But this is not the case - when running the tests, the testing output shows a diff between the 2 files (very helpful, and should be implemented for every test in order to troubleshoot problems like this).
Debugging logs from Pytest For the test `test_trisomy_21_calculation_with_valid_request' in test_trisomy21.py, that diff shows a number of differences, described below:
measurement_calculated_values
, both the chronological_centile
and corrected_centile
take on integer representations (67), whereas in the 'ground truth' file (test_trisomy21_calculation_valid.json
) it takes on a floating point representation (67.0). Pytest is unable to equate the two, and as such, throws errors (see photo)plottable_data
, numerous important values are omitted from the response generated by the TestClient. centile
and sds
values are completely omitted in this part of the JSON, even though they are present (albeit as floats) in measurement_calculated_values
. In the nested sds_data
object in plottable_data
, these 2 values are omitted again, alongside observation_error
, which is also expected.chronological_centile
and corrected_centile
values were set. For the dev reading this issue, follow this stack to reach the definitions by working through the appropriate function calls (alternatively see the endpoint for these definitions):Open test_trisomy21.py
and look inside test_trisomy_21_calculation_with_valid_request()
. You will see a POST request sent to /trisomy-21/calculation/
. This endpoint can be found in trisomy21.py
, where the decorator specifies a expected response_model
equal to MeasurementObject
:
-STACK LEVEL1 An instance of the Measurement
class is then called, which is where calculations are performed. This is a class imported from the rcpchgrowth-python
repository, aka the rcpchgrowth python library. Locating this class definition in measurement.py
(now in the rcpchgrowth-python
repository):
--STACK LEVEL2 Look to self.plottable_centile_data
, which contains a dictionary that has 2 centile
key:value pairs, one inside chronological_decimal_age_data
and the other inside corrected_decimal_age_data
. In chronological_decimal_age_data
, centile
is set equal to a call to self.calculated_measurements_object["measurement_calculated_values"]["chronological_centile"]
, and in corrected_decimal_age_data
, centile
is set equal to self.calculated_measurements_object["measurement_calculated_values"]["corrected_centile"]
. By accessing the calculated_measurements_object
, we can see that.....:
---STACK LEVEL3 It is then set to self.sds_and_centile_for_measurement_method()
. Which inside of it....:
----STACK LEVEL4 Has the returned object set equal to another function, __create_measurement_object()
, which returns the original measurement_calculated_values
that is called in the second entry to our mockup stack here. chronological_centile
is set to chronological_centile_value
, and corrected_centile
is set to corrected_centile_value
, both of which have the following definitions:
As noted in the picture, it is these int()
calls that I believe may be contributing to the integer-float discrepancy in the output. Another explanation is possible: in response_schema_classes.py
, in ChronologicalDecimalAgeData and CorrectedDecimalAgeData classes, the structure of the response is defined, and both centile
and sds
are defined as having float variables:
centile
and sds
values (as well as observation_error
) from the plottable_data
area of the response, I have not been able to locate a reason for this bug. It is worth mentioning that in the response, even in the test response, they centile
and sds
actually are returned in measurement_calculated_values
, but not in plottable_data
, as noted. This is strange behaviour, because the centile
and sds
values in plottable_data
actually set themselves equal to their respective values in measurement_calculated_values
.After commits from @eatyourpeas , the Response schema has been updated to define bone_age_sds accepted type as float, and to provide None for optional parameters. This now resolves many tests, but the remainder of those failing can be categorised into the following groups with the following errors:
Those tests (in the ukwho, trisomy and turner test files) that call their respective chart_coordinates
endpoints all fail, raising Validation Errors from the serialize_response FastAPI function:
Those tests (in the ukwho, trisomy and turner test files) that call their respective fictional_child_data
endpoint all fail, raising numerous Validation Errors all for the same reason. It seems as though the generate_fictional_child_data() function from the rcpchgrowth-python repository is passing a date with a time (ie date + hours) to an object that is expecting solely a date (hence making Pydantic throw the error). Therefore, the error being thrown is related to this:
Finally, Pydantic is throwing some warnings that we are still not migrated to V2
After https://github.com/rcpch/digital-growth-charts-server/pull/218 I think the only issue left from the list above is https://github.com/rcpch/digital-growth-charts-server/issues/217
The bulk of the existing 16000 tests were part of the
rcpchgrowth
python package and have been moved out into that repo, leaving this repo looking a little short of tests.We ideally need to add some tests which will check for regressions in the actual behaviour of the API. (There's no need to test the correctness of the calculations here since they are being tested in the
rcpchgrowth
package)Flask has a mechanism to instantiate a running API that you can throw tests at https://flask.palletsprojects.com/en/1.1.x/testing/
Each endpoint should have a test to check