uf-hobi-informatics-lab / UTH-omoponfhir-main-v54-r4

OMOP v5.4 on FHIR R4 with SQL Render - GUI Server
Apache License 2.0
0 stars 0 forks source link

database vocabulary and clinical data csv issues update #1

Open YeechingTiger opened 1 month ago

YeechingTiger commented 1 month ago

Issue: when loading data into database table using the provided sql file, it throws some errors as follow.

Image

Discuss: Firstly, it was considered as the the var length problem. After reviewing with Xing, it turns out it's the problem of the table name error and the the delimiter is the comma format.

Solutions:

  1. correct the original sql file vocabulary input table name.
  2. correct the the clinical data table name referring by /data2/datasets/hexing/postgres/scripts/data_import_PostGRES.sql
  3. using the the corrected clinical data under the /data2/datasets/hexing/postgres/csv_files/corrected, which contains 4 files: LDS_drug_exposure.csv LDS_location.csv LDS_measurement.csv LDS_observation.csv

updates:

Image

  1. CONCEPT_CPT4.csv has null
  2. SOURCE_TO_CONCEPT_MAP.csv probably it's the second row containing null throws the error.
  3. LDS_drug_exposure.csv has extra data comparing with the table
  4. LDS_device_exposure.csv contains more columns
  5. LDS_location.csv has extra data
  6. /LDS_measurement.csv "modifier_of_event_id" of relation "measurement" does not exist
  7. LDS_observation.csv has extra data comparing with the table
YeechingTiger commented 1 month ago
  1. CONCEPT_CPT4.csv has null Check the original data and the target table

  2. SOURCE_TO_CONCEPT_MAP.csv probably it's the second row containing null throws the error. Check the data and remove it if necessary

  3. LDS_drug_exposure.csv has extra data comparing with the table Add additional column in the table; take a look at the https://ohdsi.github.io/CommonDataModel/cdm53.html IDR data is using a mixed version of 5.3 + 5.4; additional columns should be from 5.3.

  4. LDS_device_exposure.csv contains more columns Same to 3

  5. LDS_location.csv has extra data Same to 3

  6. LDS_measurement.csv "modifier_of_event_id" of relation "measurement" does not exist Same to 3

  7. LDS_observation.csv has extra data comparing with the table Same to 3

huyuaaaray commented 1 month ago

I processed previous suggestions. And I found these below:

  1. SOURCE_TO_CONCEPT_MAP.csv's invalid column is 'NULL' instead of empty and thus the sql will read it as string 'NULL instead of sql null value. Then I change the code as below and successfully import this table: copy source_to_concept_map FROM '/hexing/SOURCE_TO_CONCEPT_MAP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE '"' NULL 'NULL';

  2. LDS_drug_exposure.csv: there are some comma in the cell and thus the sql will detect it as seperated value, which will cause exceeding columns. Then I replace all the comma as empty and successfully import this table:

    before: Image after: Image

  3. LDS_location.csv: same as 3 and successfully import this table.

  4. LDS_observation.csv: same as 3 and successfully import this table.

However, there are still 3 tables cannot be imported:

  1. CONCEPT_CPT4.csv's concept name column is all NULL. (didn't not copy to table and waiting for new data)
  2. LDS_visit_occurrence.csv: its columns name are not consist with the schema, and I change the csv file's column name as the schema: csv file: ['visit_occurrence_id', 'person_id', 'visit_concept_id', 'visit_start_date', 'visit_start_datetime', 'visit_end_date', 'visit_end_datetime', 'visit_type_concept_id', 'provider_id', 'care_site_id', 'visit_source_value', 'visit_source_concept_id', 'admitting_source_concept_id', 'admitting_source_value', 'discharge_to_concept_id', 'discharge_to_source_value', 'preceding_visit_occurrence_id'] schema: ['visit_occurrence_id', 'person_id', 'visit_concept_id', 'visit_start_date', 'visit_start_datetime', 'visit_end_date', 'visit_end_datetime', 'visit_type_concept_id', 'provider_id', 'care_site_id', 'visit_source_value', 'visit_source_concept_id', 'admitted_from_concept_id', 'admitted_from_source_value', 'discharged_to_concept_id', 'discharged_to_source_value', 'preceding_visit_occurrence_id'] issue: psql:/omop_ddl/add_sql.sql:92: ERROR: column "admitting_source_concept_id" of relation "visit_occurrence" does not exist
  3. LDS_measurement.csv: there are lack of 2 columns in the schema and add alter the table to add these 2 columns: ALTER TABLE measurement ADD COLUMN modifier_of_event_id BIGINT, ADD COLUMN modifier_of_field_concept_id BIGINT; ALTER TABLE measurement ALTER COLUMN measurement_id TYPE BIGINT; issue: psql:/omop_ddl/add_sql.sql:82: ERROR: value too long for type character varying(50) CONTEXT: COPY measurement, line 852, column value_source_value: ""Digital Differential not available. Refer to Autom""
YeechingTiger commented 1 month ago

LDS_drug_exposure.csv LDS_location.csv LDS_observation.csv Please use the "corrected version" I provided.

re: LDS_visit_occurrence.csv: its columns name are not consist with the schema, and I change the csv file's column name as the schema: Add the missing column.

re: LDS_measurement.csv: there are lack of 2 columns in the schema and add alter the table to add these 2 columns: Change the varying length for value_source_value to 255

huyuaaaray commented 1 month ago
  1. I do use the corrected files in the gpu server, but it still contain the comma. For example, for the LDS_location.csv row 47 as below, address 1 column has a comma, which will cause a error when import them to the table. Therefore, I just replace all the comma to space: Image 2.their columns are consist with each other, but it still throws this error: schema: Image data:

Image

  1. same as 2: schema:

Image

data:

Image

Then change it as 255:

Image Then it still shows the data exceed the limit columns on row 33330, so I take a look of that: 2379192847,202300351,40757378,2023-03-15,2023-03-15 11:32:00,1132,32817,,,0,0,,,202315141,221764305,305783052,54247-2,40757378,,,"See Oxycodone Notes, LDT Notes",,,, I noticed it still have a comma on the value_source_value, which cause the error.

YeechingTiger commented 1 month ago

@huyuaaaray In your import SQL, please change

QUOTE E'\b'

to

QUOTE '"'
huyuaaaray commented 1 month ago

@YeechingTiger , still not work psql:/omop_ddl/add_sql.sql:82: ERROR: missing data for column "unit_source_value" CONTEXT: COPY measurement, line 48780053: "3046209172,202304915,35918565,2020-08-12,2020-08-12 00:00:00,,32534,,0,,,,,,,,Cs_Mets_at_Dx_Liver,35..." psql:/omop_ddl/add_sql.sql:92: ERROR: column "admitting_source_concept_id" of relation "visit_occurrence" does not exist