monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 1 forks source link

Replace empty strings with null in sqlite artifact #404

Open kevinschaper opened 1 year ago

kevinschaper commented 1 year ago

For consistency with Solr, matching monarch-py output from different sources, and general good database practices, we need to replace our empty strings with null.

We need to do it for all of the columns in all of the tables. Ideally, there should be a way to do this in bulk.

kevinschaper commented 1 year ago

Here is an example of the output that we're making from monarch-py right now, and what's particularly unfortunate is that we end up wrapping an empty string in a list. The nulls below are actually from columns that are defined in the data model, but don't exist in the database - which seems like a different (but also important) problem.

{
    "aggregator_knowledge_source": [
        "infores:monarchinitiative"
    ],
    "id": "uuid:6c5acfe3-9a46-11ed-bf1e-791522c88a3d",
    "subject": "MONDO:0012933",
    "original_subject": "OMIM:612555",
    "subject_namespace": null,
    "subject_category": [],
    "subject_closure": [],
    "subject_label": null,
    "subject_closure_label": [],
    "predicate": "biolink:has_phenotype",
    "object": "HP:0100615",
    "original_object": null,
    "object_namespace": null,
    "object_category": [],
    "object_closure": [],
    "object_label": null,
    "object_closure_label": [],
    "knowledge_source": [
        ""
    ],
    "primary_knowledge_source": [
        "infores:hpoa"
    ],
    "category": [
        "biolink:DiseaseToPhenotypicFeatureAssociation"
    ],
    "negated": null,
    "provided_by": "hpoa_disease_phenotype_edges",
    "publications": [
        "OMIM:612555"
    ],
    "qualifiers": [
        ""
    ],
    "frequency_qualifier": null,
    "has_evidence": "ECO:0000501",
    "onset_qualifier": null,
    "sex_qualifier": null,
    "source": null,
    "stage_qualifier": null,
    "pathway": null,
    "relation": null
}