To reproduce this issue, install nmdc_schema>=5.0.3 into some environment and run (in python shell):
>> from nmdc_schema.nmdc_data import get_nmdc_file_type_enums
>> get_nmdc_file_type_enums()
I came across this issue while trying to run an ingest for the NMDC data portal after upgrading our nmdc_schema package version from v3.2 to v7.0.
During ingest of data_objects, we call get_nmdc_file_type_enums, which resulted in the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/nmdc_schema/nmdc_data.py", line 143, in get_nmdc_file_type_enums
schema = get_nmdc_schema_definition()
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/nmdc_schema/nmdc_data.py", line 126, in get_nmdc_schema_definition
return load_raw_schema(nmdc_yaml)
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml/utils/rawloader.py", line 77, in load_raw_schema
schema = yaml_loader.load(
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml_runtime/loaders/loader_root.py", line 85, in load
results = self.load_any(*args, **kwargs)
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml_runtime/loaders/yaml_loader.py", line 25, in load_any
return self.load_source(source, loader, target_class, accept_header="text/yaml, application/yaml;q=0.9",
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml_runtime/loaders/loader_root.py", line 58, in load_source
data = hbread(source, metadata, metadata.base_path, accept_header)
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/hbreader/__init__.py", line 260, in hbread
with hbopen(source, open_info, base_path, accept_header, is_actual_data, read_codec) as f:
File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/hbreader/__init__.py", line 227, in hbopen
raise AssertionError("Programming error in file type detection logic")
AssertionError: Programming error in file type detection logic
By installing nmdc_schema from version 3.2 onwards and running the code above as a test, I was able to determine that the update from v5.0.2 to v5.0.3 contained the breaking change.
By locally editing .../site-packages/nmdc_schema/nmdc_data.py, it appears that this function (get_nmdc_yaml_bytesIO) is currently broken. The call to pkgutil.get_data can't find the file, so the schema isn't properly loaded.
To reproduce this issue, install
nmdc_schema>=5.0.3
into some environment and run (in python shell):I came across this issue while trying to run an ingest for the NMDC data portal after upgrading our
nmdc_schema
package version from v3.2 to v7.0.During ingest of
data_objects
, we callget_nmdc_file_type_enums
, which resulted in the following error:By installing
nmdc_schema
from version 3.2 onwards and running the code above as a test, I was able to determine that the update from v5.0.2 to v5.0.3 contained the breaking change.By locally editing
.../site-packages/nmdc_schema/nmdc_data.py
, it appears that this function (get_nmdc_yaml_bytesIO
) is currently broken. The call topkgutil.get_data
can't find the file, so the schema isn't properly loaded.The following change appears to fix the issue
return io.BytesIO(pkgutil.get_data("nmdc_schema", "nmdc.yaml"))
->return io.BytesIO(pkgutil.get_data(__name__, "nmdc.yaml"))
Further investigation shows that
__name__
evaluates to"nmdc_schema.nmdc_data"
.Let me know if further details are needed.