imsweb / algorithms

Java implementation of cancer-related algorithms (NHIA, NAPIIA, Survival Time, etc...)
Other
6 stars 6 forks source link

Implement EPHT SubCounty Lookup #120

Closed howew closed 3 years ago

howew commented 3 years ago

This is a new field being collected by CDC. It's a standard lookup with state, county at dx analysis, and census tract 2010.

The NAACCR XML info for the algorithms API implementation is as follows: <ItemDef naaccrNum="9993" naaccrName="EPHT 2010 GEO ID 5K" naaccrId="epht2010GeoId5k" length="11" parentXmlElement="Tumor" recordTypes="A,M,C,I" /> <ItemDef naaccrNum="9994" naaccrName="EPHT 2010 GEO ID 20K" naaccrId="epht2010GeoId20k" length="11" parentXmlElement="Tumor" recordTypes="A,M,C,I" /> epht-sub-counties.zip

I've attached the resource file. Note: any values for epht2010GeoId5k and epht2010GeoId20k that are in the lookup that are less than 11 characters in length should be padded on the left with zeros until they reach 11 characters when they are retrieved from the lookup.

The algorithm should implement the "flavors of unknown" (A, B, C, D) that are used by other algorithms in the library. Additionally, the value 99999999999 should be considered as an unknown value by the algorithms API.

kirbykn commented 3 years ago

There is a section in the data file where the "State" value is "05". It looks like it is where the "AR" lines should be. Does that need to be fixed?

kirbykn commented 3 years ago

For the "flavors of unknown", I assume that they should follow the RuralUrbanUtils definitions.

  • A = State, county, or tract are invalid
  • <li>B = State and tract are valid, but county was not reported</li>
  • C = State + county + tract combination was not found
  • D = State, county, or tract are blank or unknown/li>
  • I am not sure how that needs to work. When would the 9-filled value be used? Also, is it correct that A, B, C, D are all single digits going into the 11 digit fields? When the value is unknown, are both the 5k and 20k fields set to the same value?

    kirbykn commented 3 years ago

    Is the algorithm name "EPHT SubCounty"? Is the algorithm version going to be "1.0" to start out?

    howew commented 3 years ago

    Yes, the version should be 1.0.

    The flavors of unknown can be copied from UrbanRural utils. The 9 filled value is only used after you check for A-D and actually pull from the table. If the state/county/tract combination has 99999999999 in the table then you would use it.

    If A-D is used for 1 it will be used for both... but there might be rows where 5k is 999999999 and 20k is not.

    howew commented 3 years ago

    Yes, please change 05 to AZ.

    depryf commented 3 years ago

    This new algorithm will be available with version 3.6.