Open ericaVoss opened 4 years ago
Find them attached here. I will update the repo later.
Thank you, took a look at the file.
I'm going to bounce it off the OMOP Vocabulary and make a proper SOURCE_TO_CONCEPT_MAP file. I'll share the SQL with you after if you are interest.
Basically I created a STCM file from your file. Taking where your file mapped to standard CONCEPT_IDs.
318275 CUIs (only 30%) do not map to a standard concept. We are going to run this in a system that uses CUIs and see if any of them are common.
WITH CTE_CUI AS (
SELECT DISTINCT CUI
FROM SCRATCH.dbo.EAV_ANANKE
),
CTE_CUI_TO_STANDARD AS (
SELECT a.CUI, c.CONCEPT_ID AS TARGET_CONCEPT_ID, c.CONCEPT_NAME AS TARGET_CONCEPT_NAME, c.VOCABULARY_ID AS TARGET_VOCABULARY_ID
FROM SCRATCH.dbo.EAV_ANANKE a
JOIN CONCEPT c
ON c.CONCEPT_ID = a.CONCEPT_ID
AND c.STANDARD_CONCEPT = 'S'
)
SELECT
c.CUI AS SOURCE_CODE,
0 AS SOURCE_CONCEPT_ID,
'LYNXCARE_CUI_TO_STANDARD' AS SOURCE_VOCABULARY_ID,
NULL AS SOURCE_CODE_DESCRIPTION,
CASE WHEN cs.TARGET_CONCEPT_ID IS NULL THEN 0 ELSE cs.TARGET_CONCEPT_ID END AS TARGET_CONCEPT_ID,
cs.TARGET_CONCEPT_NAME, cs.TARGET_VOCABULARY_ID,
'1970-01-01' AS VALID_START_DATE,
'2099-12-31' AS VALID_END_DATE,
NULL AS INVALID_REASON
FROM CTE_CUI c
LEFT OUTER JOIN CTE_CUI_TO_STANDARD cs
ON cs.CUI = c.CUI
We realized that sometimes we can make a double hop to get to a standard code.
WITH CTE_CUI AS (
SELECT DISTINCT CUI
FROM SCRATCH.dbo.EAV_ANANKE
),
CTE_CUI_TO_STANDARD AS (
SELECT a.CUI, c.CONCEPT_ID AS TARGET_CONCEPT_ID, c.CONCEPT_NAME AS TARGET_CONCEPT_NAME, c.VOCABULARY_ID AS TARGET_VOCABULARY_ID
FROM SCRATCH.dbo.EAV_ANANKE a
JOIN CONCEPT c
ON c.CONCEPT_ID = a.CONCEPT_ID
AND c.STANDARD_CONCEPT = 'S'
),
CTE_CUI_HOP_TO_STANDARD AS (
--Sometimes the non-standard codes mapped can get us to a standard code
SELECT a.CUI, c2.CONCEPT_ID AS TARGET_CONCEPT_ID, c2.CONCEPT_NAME AS TARGET_CONCEPT_NAME, c2.VOCABULARY_ID AS TARGET_VOCABULARY_ID
FROM SCRATCH.dbo.EAV_ANANKE a
JOIN CONCEPT c
ON c.CONCEPT_ID = a.CONCEPT_ID
JOIN CONCEPT_RELATIONSHIP cr
ON cr.CONCEPT_ID_1 = c.CONCEPT_ID
AND cr.RELATIONSHIP_ID = 'Maps To'
AND cr.INVALID_REASON IS NULL
JOIN CONCEPT c2
ON c2.CONCEPT_ID = cr.CONCEPT_ID_2
),
CTE_MAPPED AS (
--Take standard first
SELECT
c.CUI,
cs.TARGET_CONCEPT_ID, cs.TARGET_CONCEPT_NAME, cs.TARGET_VOCABULARY_ID
FROM CTE_CUI c
JOIN CTE_CUI_TO_STANDARD cs
ON cs.CUI = c.CUI
UNION ALL
--For other codes try to hop to non-standard
SELECT
c.CUI,
cs.TARGET_CONCEPT_ID, cs.TARGET_CONCEPT_NAME, cs.TARGET_VOCABULARY_ID
FROM CTE_CUI c
JOIN CTE_CUI_HOP_TO_STANDARD cs
ON cs.CUI = c.CUI
AND c.CUI NOT IN (SELECT CUI FROM CTE_CUI_TO_STANDARD)
)
SELECT c.CUI AS SOURCE_CODE,
0 AS SOURCE_CONCEPT_ID,
'LYNXCARE_CUI_TO_STANDARD' AS SOURCE_VOCABULARY_ID,
NULL AS SOURCE_CODE_DESCRIPTION,
c.TARGET_CONCEPT_ID, c.TARGET_CONCEPT_NAME, c.TARGET_VOCABULARY_ID,
'1970-01-01' AS VALID_START_DATE,
'2099-12-31' AS VALID_END_DATE,
NULL AS INVALID_REASON
FROM CTE_MAPPED c
UNION ALL
SELECT c.CUI AS SOURCE_CODE,
0 AS SOURCE_CONCEPT_ID,
'LYNXCARE_CUI_TO_STANDARD' AS SOURCE_VOCABULARY_ID,
NULL AS SOURCE_CODE_DESCRIPTION,
0 AS TARGET_CONCEPT_ID,
NULL AS TARGET_CONCEPT_NAME,
NULL AS TARGET_VOCABULARY_ID,
'1970-01-01' AS VALID_START_DATE,
'2099-12-31' AS VALID_END_DATE,
NULL AS INVALID_REASON
FROM CTE_CUI c
WHERE c.CUI NOT IN (SELECT DISTINCT CUI FROM CTE_MAPPED)
Thanks! Had in my to-do list to update this repo this week! So thank you for the update!
There are are 51 codes that have 6+ more CONCEPT_ID maps.
For example:
C3652516-pneumococcus, purified polysaccharides antigen
Maps to: CONCEPT_ID | CONCEPT_ID |
---|---|
40163661 | Streptococcus pneumoniae type 1 capsular polysaccharide antigen |
40163668 | Streptococcus pneumoniae type 10A capsular polysaccharide antigen |
40163670 | Streptococcus pneumoniae type 11A capsular polysaccharide antigen |
40163672 | Streptococcus pneumoniae type 12F capsular polysaccharide antigen |
40163674 | Streptococcus pneumoniae type 14 capsular polysaccharide antigen |
40163676 | Streptococcus pneumoniae type 15B capsular polysaccharide antigen |
40163678 | Streptococcus pneumoniae type 17F capsular polysaccharide antigen |
40163680 | Streptococcus pneumoniae type 18C capsular polysaccharide antigen |
40163682 | Streptococcus pneumoniae type 19A capsular polysaccharide antigen |
40163684 | Streptococcus pneumoniae type 19F capsular polysaccharide antigen |
40163686 | Streptococcus pneumoniae type 2 capsular polysaccharide antigen |
40163688 | Streptococcus pneumoniae type 20 capsular polysaccharide antigen |
40163690 | Streptococcus pneumoniae type 22F capsular polysaccharide antigen |
40163692 | Streptococcus pneumoniae type 23F capsular polysaccharide antigen |
40163694 | Streptococcus pneumoniae type 3 capsular polysaccharide antigen |
40163696 | Streptococcus pneumoniae type 33F capsular polysaccharide antigen |
40163698 | Streptococcus pneumoniae type 4 capsular polysaccharide antigen |
40163700 | Streptococcus pneumoniae type 5 capsular polysaccharide antigen |
40163702 | Streptococcus pneumoniae type 6B capsular polysaccharide antigen |
40163704 | Streptococcus pneumoniae type 7F capsular polysaccharide antigen |
40163706 | Streptococcus pneumoniae type 8 capsular polysaccharide antigen |
40163708 | Streptococcus pneumoniae type 9N capsular polysaccharide antigen |
40163710 | Streptococcus pneumoniae type 9V capsular polysaccharide antigen |
We are thinking about just ignoring the maps that go to 6+ CONCEPT_IDs. Just an FYI . . .
Shouldn't CTE_CUI_HOP_TO_STANDARD
filter on valid relationships?
@sdebruyn - probably proper form, but at least currently all 'Maps To' relationships are valid. I updated the above SQL.
Would you mind posting the mapping files? :)