thepanacealab / OHDSIananke

OHDSI Ananke - A Tool for Mapping Between OHDSI Concept Identifiers to Unified Medical Language System (UMLS) identifiers
Apache License 2.0
14 stars 2 forks source link

CUI --> CONCEPT Mapping #1

Open ericaVoss opened 4 years ago

ericaVoss commented 4 years ago

Would you mind posting the mapping files? :)

bitmoji

jmbanda commented 4 years ago

AnankeV2.zip

Find them attached here. I will update the repo later.

ericaVoss commented 4 years ago

Thank you, took a look at the file.

I'm going to bounce it off the OMOP Vocabulary and make a proper SOURCE_TO_CONCEPT_MAP file. I'll share the SQL with you after if you are interest.

ericaVoss commented 4 years ago

Basically I created a STCM file from your file. Taking where your file mapped to standard CONCEPT_IDs.

318275 CUIs (only 30%) do not map to a standard concept. We are going to run this in a system that uses CUIs and see if any of them are common.

WITH CTE_CUI AS (
    SELECT DISTINCT CUI
    FROM SCRATCH.dbo.EAV_ANANKE
), 
CTE_CUI_TO_STANDARD AS (
    SELECT a.CUI, c.CONCEPT_ID AS TARGET_CONCEPT_ID, c.CONCEPT_NAME AS TARGET_CONCEPT_NAME, c.VOCABULARY_ID AS TARGET_VOCABULARY_ID
    FROM SCRATCH.dbo.EAV_ANANKE a
        JOIN CONCEPT c
            ON c.CONCEPT_ID = a.CONCEPT_ID
            AND c.STANDARD_CONCEPT = 'S'
)
SELECT 
    c.CUI AS SOURCE_CODE, 
    0 AS SOURCE_CONCEPT_ID, 
    'LYNXCARE_CUI_TO_STANDARD' AS SOURCE_VOCABULARY_ID, 
    NULL AS SOURCE_CODE_DESCRIPTION, 
    CASE WHEN cs.TARGET_CONCEPT_ID IS NULL THEN 0 ELSE cs.TARGET_CONCEPT_ID END AS TARGET_CONCEPT_ID, 
    cs.TARGET_CONCEPT_NAME, cs.TARGET_VOCABULARY_ID, 
    '1970-01-01' AS VALID_START_DATE,
    '2099-12-31' AS VALID_END_DATE,
    NULL AS INVALID_REASON
FROM CTE_CUI c
    LEFT OUTER JOIN CTE_CUI_TO_STANDARD cs
        ON cs.CUI = c.CUI
ericaVoss commented 3 years ago

We realized that sometimes we can make a double hop to get to a standard code.

WITH CTE_CUI AS (
    SELECT DISTINCT CUI
    FROM SCRATCH.dbo.EAV_ANANKE
), 
CTE_CUI_TO_STANDARD AS (
    SELECT a.CUI, c.CONCEPT_ID AS TARGET_CONCEPT_ID, c.CONCEPT_NAME AS TARGET_CONCEPT_NAME, c.VOCABULARY_ID AS TARGET_VOCABULARY_ID
    FROM SCRATCH.dbo.EAV_ANANKE a
        JOIN CONCEPT c
            ON c.CONCEPT_ID = a.CONCEPT_ID
            AND c.STANDARD_CONCEPT = 'S'
),
CTE_CUI_HOP_TO_STANDARD AS (
    --Sometimes the non-standard codes mapped can get us to a standard code
    SELECT a.CUI, c2.CONCEPT_ID AS TARGET_CONCEPT_ID, c2.CONCEPT_NAME AS TARGET_CONCEPT_NAME, c2.VOCABULARY_ID AS TARGET_VOCABULARY_ID
    FROM SCRATCH.dbo.EAV_ANANKE a
        JOIN CONCEPT c
            ON c.CONCEPT_ID = a.CONCEPT_ID
        JOIN CONCEPT_RELATIONSHIP cr
            ON cr.CONCEPT_ID_1 = c.CONCEPT_ID
            AND cr.RELATIONSHIP_ID = 'Maps To'
                        AND cr.INVALID_REASON IS NULL
        JOIN CONCEPT c2
            ON c2.CONCEPT_ID = cr.CONCEPT_ID_2
), 
CTE_MAPPED AS (
    --Take standard first
    SELECT 
        c.CUI, 
        cs.TARGET_CONCEPT_ID, cs.TARGET_CONCEPT_NAME, cs.TARGET_VOCABULARY_ID
    FROM CTE_CUI c
        JOIN CTE_CUI_TO_STANDARD cs
            ON cs.CUI = c.CUI
    UNION ALL
    --For other codes try to hop to non-standard
    SELECT 
        c.CUI, 
        cs.TARGET_CONCEPT_ID, cs.TARGET_CONCEPT_NAME, cs.TARGET_VOCABULARY_ID
    FROM CTE_CUI c
        JOIN CTE_CUI_HOP_TO_STANDARD cs
            ON cs.CUI = c.CUI
            AND c.CUI NOT IN (SELECT CUI FROM CTE_CUI_TO_STANDARD)
)
SELECT c.CUI AS SOURCE_CODE, 
        0 AS SOURCE_CONCEPT_ID, 
        'LYNXCARE_CUI_TO_STANDARD' AS SOURCE_VOCABULARY_ID, 
        NULL AS SOURCE_CODE_DESCRIPTION, 
        c.TARGET_CONCEPT_ID, c.TARGET_CONCEPT_NAME, c.TARGET_VOCABULARY_ID, 
        '1970-01-01' AS VALID_START_DATE,
        '2099-12-31' AS VALID_END_DATE,
        NULL AS INVALID_REASON
FROM CTE_MAPPED c
UNION ALL
SELECT c.CUI AS SOURCE_CODE, 
        0 AS SOURCE_CONCEPT_ID, 
        'LYNXCARE_CUI_TO_STANDARD' AS SOURCE_VOCABULARY_ID, 
        NULL AS SOURCE_CODE_DESCRIPTION, 
        0 AS TARGET_CONCEPT_ID, 
        NULL AS TARGET_CONCEPT_NAME, 
        NULL AS TARGET_VOCABULARY_ID, 
        '1970-01-01' AS VALID_START_DATE,
        '2099-12-31' AS VALID_END_DATE,
        NULL AS INVALID_REASON
FROM CTE_CUI c
WHERE c.CUI NOT IN (SELECT DISTINCT CUI FROM CTE_MAPPED)
jmbanda commented 3 years ago

Thanks! Had in my to-do list to update this repo this week! So thank you for the update!

ericaVoss commented 3 years ago

There are are 51 codes that have 6+ more CONCEPT_ID maps.

For example: C3652516-pneumococcus, purified polysaccharides antigen

Maps to: CONCEPT_ID CONCEPT_ID
40163661 Streptococcus pneumoniae type 1 capsular polysaccharide antigen
40163668 Streptococcus pneumoniae type 10A capsular polysaccharide antigen
40163670 Streptococcus pneumoniae type 11A capsular polysaccharide antigen
40163672 Streptococcus pneumoniae type 12F capsular polysaccharide antigen
40163674 Streptococcus pneumoniae type 14 capsular polysaccharide antigen
40163676 Streptococcus pneumoniae type 15B capsular polysaccharide antigen
40163678 Streptococcus pneumoniae type 17F capsular polysaccharide antigen
40163680 Streptococcus pneumoniae type 18C capsular polysaccharide antigen
40163682 Streptococcus pneumoniae type 19A capsular polysaccharide antigen
40163684 Streptococcus pneumoniae type 19F capsular polysaccharide antigen
40163686 Streptococcus pneumoniae type 2 capsular polysaccharide antigen
40163688 Streptococcus pneumoniae type 20 capsular polysaccharide antigen
40163690 Streptococcus pneumoniae type 22F capsular polysaccharide antigen
40163692 Streptococcus pneumoniae type 23F capsular polysaccharide antigen
40163694 Streptococcus pneumoniae type 3 capsular polysaccharide antigen
40163696 Streptococcus pneumoniae type 33F capsular polysaccharide antigen
40163698 Streptococcus pneumoniae type 4 capsular polysaccharide antigen
40163700 Streptococcus pneumoniae type 5 capsular polysaccharide antigen
40163702 Streptococcus pneumoniae type 6B capsular polysaccharide antigen
40163704 Streptococcus pneumoniae type 7F capsular polysaccharide antigen
40163706 Streptococcus pneumoniae type 8 capsular polysaccharide antigen
40163708 Streptococcus pneumoniae type 9N capsular polysaccharide antigen
40163710 Streptococcus pneumoniae type 9V capsular polysaccharide antigen

We are thinking about just ignoring the maps that go to 6+ CONCEPT_IDs. Just an FYI . . .

sdebruyn commented 3 years ago

Shouldn't CTE_CUI_HOP_TO_STANDARD filter on valid relationships?

ericaVoss commented 3 years ago

@sdebruyn - probably proper form, but at least currently all 'Maps To' relationships are valid. I updated the above SQL.