ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

Fuzzy linkage tests improvements #105

Closed martinburchell closed 1 year ago

martinburchell commented 1 year ago

In my attempt to better understand the code, I've made what are hopefully improvements to the multiple comparison fuzzy linkage tests.

I've split the one test into separate tests and added checks for the returned matches. Without these checks all the tests would still pass with , x._distance removed from sort_asc_best_to_worst()

I'm hoping by predicting the order of the matches there aren't any flaky tests. I believe when a list is sorted in python and there is a tie between two entries, the original order should be retained.

Running a code coverage report (pip install pytest-cov and then pytest --cov --cov-report html) shows the continue when ci is None (line 2390) is not currently tested. Would that be easy to do?

RudolfCardinal commented 1 year ago

Thanks -- that's terrific. There is an edge case where the "distance" check does make a difference, so I've added a check for that (test_order_correct_with_duplicate_names_2). I've added a test for that coverage.