imsweb / algorithms

Java implementation of cancer-related algorithms (NHIA, NAPIIA, Survival Time, etc...)
Other
6 stars 6 forks source link

County at Diagnosis for Analysis #55

Closed howew closed 5 years ago

howew commented 5 years ago

I would like to request that the calculation for county at diagnosis (analysis) be added to the library. I've attached source code from NAACCR*Prep. It may need to be updated for 2019.

CountyAtDiagnosisAnalysis.txt

depryf commented 5 years ago

@garybeverungen can you please implement this new algorithm? You will need to add a new package/utility class, but also register a new "Algorithm" in the new framework (see Algorithms). Thanks.

garybeverungen commented 5 years ago

Hi @howew, I've taken a look at CountyAtDiagnosisAnalysis.txt and I just want to make sure I've understood the algorithm correctly before moving on. It seems like the general idea is that we have two sources of information for County at Diagnosis: the value manually entered in a record and the results from a geocoder. This algorithm assess the quality of each piece of information, and then decides which value (if either) to return. I've summarized the algorithm in pseudocode below. Let me know if I'm misunderstanding something.

if (we don't have state or year info)
    use 999 for derived county code
else if (the state is Canadian)
    use 998
else if (record county info is null)
    use 999
else if (geocoder county info is invalid (null, blank, 999, or for the wrong state))
    use record county info
else if (record and geocoder info agree)
   use record county info
else if (geocoder county info is valid and certainty is not blank/9, and record county info is blank/9)
    use record county info ***
else if (geocoder county info is valid but certainty is blank/9, and record county info is blank/9)
    use geocoder county info
else if (geocoder county info is valid but certainty is blank/9, and record county info is not blank/9)
    use record county info
else if (geocoder county info is valid, certainty is not blank/9, and record county info is not blank/9)
    use record county info if certainty is 2-5, use geocoder county info if certainty is 1 or 6

*** I think this we should probably use the geocoder values here. Intuitively, a geocoder value with a non-blank/9 certainty score is probably preferable to a blank/9 record value, right? I looked at the SAS code in the comments and I think it confirms this. (Although it's confusing because county_rec refers to the geocoder info, and county refers to the record info.) The relevant line:

else if certainty not in(' ','9') and county in('   ', '999') then do;
        flg = 3;
        cnty_drv = county_rec;
        end;
depryf commented 5 years ago

The changes have been merged, they will be available with version 2.1.