monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

Validate kgx files against monarch-app schema #478

Open kevinschaper opened 1 year ago

kevinschaper commented 1 year ago

The iri column is coming in from kg-phenio, through monarch-ingest. It's not yet defined in the schema, so Solr represents it as a multivalued column, which isn't what we want.

For the moment, #474 is going out of its way to trim the iri field out of Solr documents to avoid problems when creating pydantic instances, and this issue is so that we don't lose track of that hack.

On the monarch-ingest / linkml-solr side, we probably want to avoid passing extra fields from the tsv file to Solr. It would have probably been better to get an index-time error.

As for iri itself, right now we handle that expansion in via curies in the app, so if we include it, it would only be for phenio. We could also make the choice to populate it for other entities? or we could leave it out of our kg-phenio ingest, and then stick with only handling curie expansion in the code level.

kevinschaper commented 7 months ago

I want to add a note that I tried this out, and found that there were a lot of false negatives where linkml-validate complained about types, like nodes where the name is a number would fail for not being a string, or that single values in multivalued fields were erroneously not lists. We probably want to run as a module rather than from the cli, so that we can swallow some categories of errors - or we want to validate against a more type-defined file