monarch-initiative / medgen

MedGen ingest.
0 stars 0 forks source link

Bug: Release: `medgen-disease-extract.owl` too large #10

Open joeflack4 opened 1 year ago

joeflack4 commented 1 year ago

Overview

@matentzn Not that it matters too much because we are not using medgen-disease-extract.owl; we may be switching our ingest process over to the MedGen team. But basically, the problem here I think is that this file has increased from 700MB before to now over 2GB (the limit for GH release files) due to us adding duplicate classes and xrefs (e.g. as in the second block below)

    if ($id =~ /^CN\d+/) {
        add_triples('MEDGENCUI', $id);
    # If a CUI (starts with 'C'), will be created twice: one for MEDGENCUI, one for UMLS
    } elsif ($id =~ /^C\d+/) {
        add_triples('UMLS', $id);
        add_triples('MEDGENCUI', $id);
    # UID
    } else {
        add_triples('MEDGEN', $id);
    }
matentzn commented 1 year ago

No worries! If we need to run aligments, we will set up a Jenkins job instead of GHA! Thanks for making the issue.