microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
4 stars 3 forks source link

Runtime references `nmdc_schema_accepting_legacy_ids`, which is absent from `nmdc-schema` version `10.5.4` #570

Open eecavanna opened 1 week ago

eecavanna commented 1 week ago

Tasks

Related: https://github.com/microbiomedata/nmdc-runtime/issues/554 and https://github.com/microbiomedata/nmdc-runtime/issues/555

CC: @PeopleMakeCulture @dwinston

eecavanna commented 1 week ago

I'll spend a few minutes on task 1 and then report back.

eecavanna commented 1 week ago

The only place the nmdc_schema_accepting_legacy_ids schema is referenced is here:

https://github.com/microbiomedata/nmdc-runtime/blob/5c96e125a303846c0416b02dbb8d746039009db3/nmdc_runtime/util.py#L19

eecavanna commented 1 week ago

The repo does contain references to the "normal" schema, but I don't know whether the importing code is used in practice. For example:

https://github.com/microbiomedata/nmdc-runtime/blob/5c96e125a303846c0416b02dbb8d746039009db3/components/nmdc_runtime/workflow_execution_activity/core.py#L7

eecavanna commented 1 week ago

In commit https://github.com/microbiomedata/nmdc-runtime/pull/560/commits/73234a8813426a095ad83f7370902343554813d9 (part of PR https://github.com/microbiomedata/nmdc-runtime/pull/560), the Runtime is both (a) using nmdc-schema version 10.5.4 and (b) no longer using the nmdc_schema_accepting_legacy_ids.

I still expect the GHA workflow to fail due to an unrelated issue in nmdc-schema, but we can still use the portion that doesn't fail as data related to making the change described in this issue.

PeopleMakeCulture commented 1 week ago

@eecavanna I updated comments in #555. The util uses the old schema for validate JSON. It should be fine to update to nmdc_schema.nmdc. The only remaining step to deprecate it is to test validate_json with the schema change.

PeopleMakeCulture commented 1 week ago

The only place the nmdc_schema_accepting_legacy_ids schema is referenced is here:

https://github.com/microbiomedata/nmdc-runtime/blob/5c96e125a303846c0416b02dbb8d746039009db3/nmdc_runtime/util.py#L19

@eecavanna Please note that older versions of the bulk referential integrity check notebook also depend on nmdc_schema.nmdc_schema_accepting_legacy_ids. This code is updated in #550. Pls use the new notebook moving forward

PeopleMakeCulture commented 1 week ago
  1. Determine whether the Runtime is using that variant of the schema because it depends upon something that is specified to it, or only because it used to depend upon something specific to it and then people neglected to update the import when that dependence wasn't present anymore. In other words, determine whether the Runtime would work OK if it were using the normal schema instead.

@eecavanna nmdc_schema_accepting_legacy_ids was used for both the validate_json() util and ref integrity notebook prior to re-id-ing. We have confirmed that post-reID-ing we no longer need to accept legacy ids and can deprecate it in:

# from nmdc_schema.nmdc_schema_accepting_legacy_ids import Database as NMDCDatabase
from nmdc_schema.nmdc import Database as NMDCDatabase

CC: @dwinston

PeopleMakeCulture commented 2 days ago

Fixed in: https://github.com/microbiomedata/nmdc-runtime/pull/550

ssarrafan commented 2 days ago

Appears to be active (or maybe it can be closed?). Moving to the next sprint.