microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

`get_nmdc_yaml_bytesIO` returns no data #548

Closed naglepuff closed 1 year ago

naglepuff commented 1 year ago

To reproduce this issue, install nmdc_schema>=5.0.3 into some environment and run (in python shell):

>> from nmdc_schema.nmdc_data import get_nmdc_file_type_enums
>> get_nmdc_file_type_enums()

I came across this issue while trying to run an ingest for the NMDC data portal after upgrading our nmdc_schema package version from v3.2 to v7.0.

During ingest of data_objects, we call get_nmdc_file_type_enums, which resulted in the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/nmdc_schema/nmdc_data.py", line 143, in get_nmdc_file_type_enums
    schema = get_nmdc_schema_definition()
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/nmdc_schema/nmdc_data.py", line 126, in get_nmdc_schema_definition
    return load_raw_schema(nmdc_yaml)
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml/utils/rawloader.py", line 77, in load_raw_schema
    schema = yaml_loader.load(
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml_runtime/loaders/loader_root.py", line 85, in load
    results = self.load_any(*args, **kwargs)
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml_runtime/loaders/yaml_loader.py", line 25, in load_any
    return self.load_source(source, loader, target_class, accept_header="text/yaml, application/yaml;q=0.9",
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/linkml_runtime/loaders/loader_root.py", line 58, in load_source
    data = hbread(source, metadata, metadata.base_path, accept_header)
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/hbreader/__init__.py", line 260, in hbread
    with hbopen(source, open_info, base_path, accept_header, is_actual_data, read_codec) as f:
  File ".pyenv/versions/nmdc39/lib/python3.9/site-packages/hbreader/__init__.py", line 227, in hbopen
    raise AssertionError("Programming error in file type detection logic")
AssertionError: Programming error in file type detection logic

By installing nmdc_schema from version 3.2 onwards and running the code above as a test, I was able to determine that the update from v5.0.2 to v5.0.3 contained the breaking change.

By locally editing .../site-packages/nmdc_schema/nmdc_data.py, it appears that this function (get_nmdc_yaml_bytesIO) is currently broken. The call to pkgutil.get_data can't find the file, so the schema isn't properly loaded.

The following change appears to fix the issue

return io.BytesIO(pkgutil.get_data("nmdc_schema", "nmdc.yaml")) -> return io.BytesIO(pkgutil.get_data(__name__, "nmdc.yaml"))

Further investigation shows that __name__ evaluates to "nmdc_schema.nmdc_data".

Let me know if further details are needed.

turbomam commented 1 year ago

Thanks, good research. I will update with your suggestion

return io.BytesIO(pkgutil.get_data(__name__, "nmdc.yaml"))

turbomam commented 1 year ago

@naglepuff I think this if fixed, using your formula, in PyPI release