Closed PeopleMakeCulture closed 3 months ago
Make sure nmdc-schema v10.5.5 is used, that will resolve some of the type errors for DataObject was_generated_by values that were being discussed at the infrastructure meeting on Thursday.
Make sure nmdc-schema v10.5.5 is used, that will resolve some of the type errors for DataObject was_generated_by values that were being discussed at the infrastructure meeting on Thursday.
What's the best way to introspect about the schema version I'm importing?
I re-ran the Bulk Validation and Ref Integrity notebook after updating nmdc-schema==10.5.5
in nmdc-runtime/requirements/main.in
. Here are the new results:
len(errors["not_found"]), len(errors["invalid_type"])
# results prior to re-id-ing: (4857, 23503)
# results prior to v10.5.5: (33, 20488)
# results with v10.5.5: (33, 6900)
The number of schema validation errors (eg "invalid_type") dropped from ~20,000 to ~7,000.
Here are some samples of the type of error that still remains
(err msgs are formatted as f"{name} doc {doc['id']}: field {field} referenced doc {v} not of type {slot_range}"
)
See #570 For deprecating schema accepting legacy IDs