API test suite - Githubissues

pacharanero commented 3 years ago

The bulk of the existing 16000 tests were part of the rcpchgrowth python package and have been moved out into that repo, leaving this repo looking a little short of tests.

We ideally need to add some tests which will check for regressions in the actual behaviour of the API. (There's no need to test the correctness of the calculations here since they are being tested in the rcpchgrowth package)

Flask has a mechanism to instantiate a running API that you can throw tests at https://flask.palletsprojects.com/en/1.1.x/testing/

Each endpoint should have a test to check

it exists
it responds correctly with valid data
it responds with a helpful "what is wrong" error for various different invalid data fields

a-wei-william commented 3 years ago

@pacharanero Im working on the tests for trisomy_21 and turner

pacharanero commented 3 years ago

@a-wei-0513

The existing tests, as you will probably have found, are in the branch chart-data-persistence, as they got lumped in with some other improvements.
FastAPI has some testing documentation here https://fastapi.tiangolo.com/tutorial/testing/
Thanks for doing this!

pacharanero commented 9 months ago

@dc2007git and I spent a little time looking at this issue today.

The main issue looks to be that 'test files' look to have gotten out of date with the actual (correct) responses of the API. Over time it is going to be quite difficult to keep the JSON comparison identical which means these tests are always going to be quite brittle.

However, these tests do have the benefit of enforcing a very tight 'contract' between us and the end users in that any change from the expected responses is immediately flagged to us.

Options:

Abandon the idea of comparing the JSON response with a 'test file' JSON version. Instead we could focus on tests which check more atomic and simple parts of the response: a) eg. Check all the endpoints with valid data for 200 OK, and check all the endpoints with invalid data for 422 Unprocessable Entity or appropriate error. b) eg. Check the JSON response for the major nodes of the response, but not down to the level of the leaf nodes or the entire JSON - for example you could check that a MeasurementObject was returned containing a "measurement_calculated_values" node. These would be good enough to detect if we had inadvertently seriously broken the response, but wouldn't need to be constantly updated as with JSON comparison.
Continue with full JSON comparisons. This would probably require us to make a set of scripts which made known-valid requests and copied the JSON responses into a file for the tests to use. However doing this does somewhat undermine the power of the tests because we could make some accidental breaking change in the API, and then bake that into our test JSON, then run the tests (which almost by definition will all pass)

I personally favour 1. b. but would be interested in views from the team.

dc2007git commented 9 months ago

Full Bug Report for Context

Context Adding some more context to the overall problem for the continuous failing tests, after @pacharanero and I spoke earlier today.

Upon running the test suite on every new push, it seems as though a ValidationError is being thrown when setting response equal to client.post("<some-endpoint>", json=body). Removing this line allows for the (now-commented out) direct comparison between 2 json files to occur. The 2 files are the response that has just been generated by the TestClient, and the 'ground truth' which contains a saved response from the live API. Obviously, both of these jsons have been generated with the exact same request parameters. Therefore, the outputs should be the EXACT same - there is no reason for any differences to occur between the 2 files. But this is not the case - when running the tests, the testing output shows a diff between the 2 files (very helpful, and should be implemented for every test in order to troubleshoot problems like this).

Debugging logs from Pytest For the test `test_trisomy_21_calculation_with_valid_request' in test_trisomy21.py, that diff shows a number of differences, described below:

In the response generated by TestClient (ie not in the ground truth file), in measurement_calculated_values, both the chronological_centile and corrected_centile take on integer representations (67), whereas in the 'ground truth' file (test_trisomy21_calculation_valid.json) it takes on a floating point representation (67.0). Pytest is unable to equate the two, and as such, throws errors (see photo)

Additionally, in plottable_data, numerous important values are omitted from the response generated by the TestClient. centile and sds values are completely omitted in this part of the JSON, even though they are present (albeit as floats) in measurement_calculated_values. In the nested sds_data object in plottable_data, these 2 values are omitted again, alongside observation_error, which is also expected.

Finding the source of the bugs

First, we aimed at locating where the bugging float/integer chronological_centile and corrected_centile values were set. For the dev reading this issue, follow this stack to reach the definitions by working through the appropriate function calls (alternatively see the endpoint for these definitions):

Open test_trisomy21.py and look inside test_trisomy_21_calculation_with_valid_request(). You will see a POST request sent to /trisomy-21/calculation/. This endpoint can be found in trisomy21.py, where the decorator specifies a expected response_model equal to MeasurementObject:

-STACK LEVEL1 An instance of the Measurement class is then called, which is where calculations are performed. This is a class imported from the rcpchgrowth-python repository, aka the rcpchgrowth python library. Locating this class definition in measurement.py (now in the rcpchgrowth-python repository):

--STACK LEVEL2 Look to self.plottable_centile_data, which contains a dictionary that has 2 centile key:value pairs, one inside chronological_decimal_age_data and the other inside corrected_decimal_age_data. In chronological_decimal_age_data, centile is set equal to a call to self.calculated_measurements_object["measurement_calculated_values"]["chronological_centile"], and in corrected_decimal_age_data, centile is set equal to self.calculated_measurements_object["measurement_calculated_values"]["corrected_centile"]. By accessing the calculated_measurements_object, we can see that.....:

---STACK LEVEL3 It is then set to self.sds_and_centile_for_measurement_method(). Which inside of it....:

----STACK LEVEL4 Has the returned object set equal to another function, __create_measurement_object(), which returns the original measurement_calculated_values that is called in the second entry to our mockup stack here. chronological_centile is set to chronological_centile_value, and corrected_centile is set to corrected_centile_value, both of which have the following definitions:

As noted in the picture, it is these int() calls that I believe may be contributing to the integer-float discrepancy in the output. Another explanation is possible: in response_schema_classes.py, in ChronologicalDecimalAgeData and CorrectedDecimalAgeData classes, the structure of the response is defined, and both centile and sds are defined as having float variables:

In terms of the absence of the centile and sds values (as well as observation_error) from the plottable_data area of the response, I have not been able to locate a reason for this bug. It is worth mentioning that in the response, even in the test response, they centile and sds actually are returned in measurement_calculated_values, but not in plottable_data, as noted. This is strange behaviour, because the centile and sds values in plottable_data actually set themselves equal to their respective values in measurement_calculated_values.

dc2007git commented 9 months ago

Update On Testing after @eatyourpeas and @dc2007git review

After commits from @eatyourpeas , the Response schema has been updated to define bone_age_sds accepted type as float, and to provide None for optional parameters. This now resolves many tests, but the remainder of those failing can be categorised into the following groups with the following errors:

Those tests (in the ukwho, trisomy and turner test files) that call their respective chart_coordinates endpoints all fail, raising Validation Errors from the serialize_response FastAPI function:
Those tests (in the ukwho, trisomy and turner test files) that call their respective fictional_child_data endpoint all fail, raising numerous Validation Errors all for the same reason. It seems as though the generate_fictional_child_data() function from the rcpchgrowth-python repository is passing a date with a time (ie date + hours) to an object that is expecting solely a date (hence making Pydantic throw the error). Therefore, the error being thrown is related to this:

Finally, Pydantic is throwing some warnings that we are still not migrated to V2

mbarton commented 3 months ago

After https://github.com/rcpch/digital-growth-charts-server/pull/218 I think the only issue left from the list above is https://github.com/rcpch/digital-growth-charts-server/issues/217

rcpch / digital-growth-charts-server

API test suite #166

Full Bug Report for Context

Finding the source of the bugs

Update On Testing after @eatyourpeas and @dc2007git review