orchid-initiative / synthetic-database-project

MIT License
4 stars 2 forks source link

Diagnosis/Procedure Code Formatting #58

Open NickKramer87 opened 1 year ago

NickKramer87 commented 1 year ago

As a data analyst, I want to see diagnosis and procedure codes that are in the same format as real CA hospital data so that they are interchangeable.

Proposed Subtasks:

  1. Alter the output of the Synthea database generator so that all codes can be displayed in ICD-10 format (not SNOMED).

Acceptance Criteria:

  1. Diagnosis and procedure codes are the same for real data and synthetic data.
TravisHaussler commented 1 year ago

As of the close of Phase 2 we have:

There are some vague leads on how to achieve this task properly, but they involve finding a way to access the JSON maps used in this project (we hit a wall accessing the AWS repo for their project on last attempt):

Some discussion on why the procedure map is so difficult:

rileeki commented 1 year ago

Quick update: Synthea won't provide the JSON maps for us because of licensing issues (per their answer to my question on the discussion board.

However, they provided some guidance for how we can create those files ourselves.

The mapping files are structured like this:

{
  "70536003": [ // synthea SNOMED code
    {
      "code": "XXXXX", // first mapped CPT code
      "description": "description of code",
      "weight": "10" // all weights are summed and then individual weights are used to drive the distribution of codes so this one is twice as likely to be used as the next one
    },
    {
      "code": "YYYYY", // second mapped CPT code
      "description": "description of code",
      "weight": "5"
    },
    ...
  ],
  ...
}

And, to just get the code running, we can create empty map files:

If you create a set of empty mapping files (just "{}") and then run the exporter, it will generate a missing_codes.csv file that will give you a list of all of the codes that need to be mapped in each of the files.

I think the next step is probably to create the empty map files and run Synthea with the bfd export option on and see what happens.

rileeki commented 1 year ago

@masonium This is the issue we talked about on Wednesday.

Summary

Introduction to Billing Code Systems

CPT® (Current Procedural Terminology)

The CPT coding system describes how to report procedures or services. The CPT system is maintained and copyrighted by the American Medical Association. Each CPT code has five digits. The AMA CPT Editorial Panel reviews and responds to requests for additions to or revisions of the CPT.

HCPCS (Healthcare Common Procedures Coding System)

HCPCS codes are used to report supplies, equipment, and devices provided to patients. A limited number of procedures not otherwise contained in the CPT system are also found here. HCPCS is alphanumeric and is administered by the Centers for Medicare and Medicaid Services (CMS) in cooperation with other third party payers.

CMS includes two levels in its Healthcare Common Procedures Coding System: HCPCS Level I is the CPT coding system; HCPCS Level II is usually referred to as HCPCS codes, described above.

ICD-10-CM (International Classification of Diseases, 10th revision, Clinical Modification)

Healthcare professionals use these codes to report diagnoses and disorders. The ICD-10-CM is maintained by the National Center for Health Statistics (NCHS). The ICD-10-CM replaced the 9th revision (ICD-9-CM) on October 1, 2015.

UMLS (Unified Medical Language System®)

The UMLS, or Unified Medical Language System, is a set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems.

You can use the UMLS to enhance or develop applications, such as electronic health records, classification tools, dictionaries and language translators.

I don't know much about this, except that Travis and I both got free UMLS licenses to be able to access a mapping from SNOMED-CT to ICD10-CM. We haven't been able to find a similar mapping for ICD10-PCS codes.

SNOMED

ICD10-PCS

ICD10PCS, often spelled “ICD-10-PCS”, is a vocabulary in the USA for coding hospital-based medical procedures. It is currently implemented as a standard vocabulary, with OMOP-generated hierarchical relations to Procedure concepts from SNOMED CT vocabulary and other relationships mimicking SNOMED CT's internal model.

I'm unfamiliar with these codes. Usually I've encountered procedure codes as CPT/HCPCS. But the California data format has procedure codes in ICD10-PCS, so that's what we're looking for!

More links that might be useful