imsweb / mph

Java implementation of the Multiple Primary and Histology Coding Rules.
Other
4 stars 2 forks source link

Support lenient MPH matching #15

Closed bekeles closed 7 years ago

bekeles commented 7 years ago

Most of the cancer groups has the histology rule that is defined as "Do the tumors have ICD-O-3 histology codes that are different at the first (Xxxx), second (Xxxx), or third (xxXx) number?". This rule by default considers 8000 not to be a match to 8010 - 9999.

The current implementation is appropriate for abstracts and path reports. The abstractor codes the histology; or a CTR sets it for a path report in path screening; or eventually a histology would be set by NLP. The NLP algorithms would code something more specific than 8000.

However, data that is converted from a coding scheme to ICD-O-3 will have a large number of NOS codes like 8000.

Therefore, we need two modes for the MP/H rules: strict and lenient. Strict = used for linking records in SEER*DMS; Lenient = for matching claims and other high volume data streams.

The only change that we need now is to handle this question differently. This question is M12 for breast. It has a different IDs for other sites:

Do the tumors have ICD-O-3 histology codes that are different at the first (Xxxx), second (Xxxx), or third (xxXx) number?

Strict = use current implementation. 8000 would not be a match to 8500. Lenient = consider 8000 to be a match of any 8nnn histology.

bekeles commented 7 years ago

This has been fixed and unit tested.