Open YeechingTiger opened 1 month ago
CONCEPT_CPT4.csv has null Check the original data and the target table
SOURCE_TO_CONCEPT_MAP.csv probably it's the second row containing null throws the error. Check the data and remove it if necessary
LDS_drug_exposure.csv has extra data comparing with the table Add additional column in the table; take a look at the https://ohdsi.github.io/CommonDataModel/cdm53.html IDR data is using a mixed version of 5.3 + 5.4; additional columns should be from 5.3.
LDS_device_exposure.csv contains more columns Same to 3
LDS_location.csv has extra data Same to 3
LDS_measurement.csv "modifier_of_event_id" of relation "measurement" does not exist Same to 3
LDS_observation.csv has extra data comparing with the table Same to 3
I processed previous suggestions. And I found these below:
SOURCE_TO_CONCEPT_MAP.csv's invalid column is 'NULL' instead of empty and thus the sql will read it as string 'NULL instead of sql null value. Then I change the code as below and successfully import this table:
copy source_to_concept_map FROM '/hexing/SOURCE_TO_CONCEPT_MAP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE '"' NULL 'NULL';
LDS_drug_exposure.csv: there are some comma in the cell and thus the sql will detect it as seperated value, which will cause exceeding columns. Then I replace all the comma as empty and successfully import this table:
before: after:
LDS_location.csv: same as 3 and successfully import this table.
LDS_observation.csv: same as 3 and successfully import this table.
However, there are still 3 tables cannot be imported:
['visit_occurrence_id', 'person_id', 'visit_concept_id', 'visit_start_date', 'visit_start_datetime', 'visit_end_date', 'visit_end_datetime', 'visit_type_concept_id', 'provider_id', 'care_site_id', 'visit_source_value', 'visit_source_concept_id', 'admitting_source_concept_id', 'admitting_source_value', 'discharge_to_concept_id', 'discharge_to_source_value', 'preceding_visit_occurrence_id']
schema:
['visit_occurrence_id', 'person_id', 'visit_concept_id', 'visit_start_date', 'visit_start_datetime', 'visit_end_date', 'visit_end_datetime', 'visit_type_concept_id', 'provider_id', 'care_site_id', 'visit_source_value', 'visit_source_concept_id', 'admitted_from_concept_id', 'admitted_from_source_value', 'discharged_to_concept_id', 'discharged_to_source_value', 'preceding_visit_occurrence_id']
issue:
psql:/omop_ddl/add_sql.sql:92: ERROR: column "admitting_source_concept_id" of relation "visit_occurrence" does not exist
ALTER TABLE measurement ADD COLUMN modifier_of_event_id BIGINT, ADD COLUMN modifier_of_field_concept_id BIGINT;
ALTER TABLE measurement ALTER COLUMN measurement_id TYPE BIGINT;
issue:
psql:/omop_ddl/add_sql.sql:82: ERROR: value too long for type character varying(50) CONTEXT: COPY measurement, line 852, column value_source_value: ""Digital Differential not available. Refer to Autom""
LDS_drug_exposure.csv LDS_location.csv LDS_observation.csv Please use the "corrected version" I provided.
re: LDS_visit_occurrence.csv: its columns name are not consist with the schema, and I change the csv file's column name as the schema: Add the missing column.
re: LDS_measurement.csv: there are lack of 2 columns in the schema and add alter the table to add these 2 columns: Change the varying length for value_source_value to 255
data:
Then change it as 255:
Then it still shows the data exceed the limit columns on row 33330, so I take a look of that:
2379192847,202300351,40757378,2023-03-15,2023-03-15 11:32:00,1132,32817,,,0,0,,,202315141,221764305,305783052,54247-2,40757378,,,"See Oxycodone Notes, LDT Notes",,,,
I noticed it still have a comma on the value_source_value, which cause the error.
@huyuaaaray In your import SQL, please change
QUOTE E'\b'
to
QUOTE '"'
@YeechingTiger , still not work
psql:/omop_ddl/add_sql.sql:82: ERROR: missing data for column "unit_source_value" CONTEXT: COPY measurement, line 48780053: "3046209172,202304915,35918565,2020-08-12,2020-08-12 00:00:00,,32534,,0,,,,,,,,Cs_Mets_at_Dx_Liver,35..." psql:/omop_ddl/add_sql.sql:92: ERROR: column "admitting_source_concept_id" of relation "visit_occurrence" does not exist
Issue: when loading data into database table using the provided sql file, it throws some errors as follow.
Discuss: Firstly, it was considered as the the var length problem. After reviewing with Xing, it turns out it's the problem of the table name error and the the delimiter is the comma format.
Solutions:
updates: