yukihiko-shinoda / yaml-dataclass-config

This project helps you to import config file writen by YAML to Python dataclass.
MIT License
36 stars 5 forks source link

Loading empty YAML file fails due to broken contract with Marshmallow #21

Open robin92 opened 2 years ago

robin92 commented 2 years ago

Example

# pytest-compatible file
import pathlib
from dataclasses import dataclass

from yamldataclassconfig import YamlDataClassConfig

def test_loads_empty_file(tmp_path: pathlib.Path):
    @dataclass
    class Config(YamlDataClassConfig):
        pass

    cfg_path = tmp_path / 'config.yml'
    cfg_path.touch()

    Config().load(cfg_path)

Expected result Test finishes without error.

Actual result The following error is reported.

../../.cache/pypoetry/virtualenvs/isa-factory-cloud-agent-jX_-wP6c-py3.8/lib/python3.8/site-packages/yamldataclassconfig/config.py:35: in load
    self.__dict__.update(self.__class__.schema().load(dictionary_config).__dict__)
../../.cache/pypoetry/virtualenvs/isa-factory-cloud-agent-jX_-wP6c-py3.8/lib/python3.8/site-packages/marshmallow/schema.py:722: in load
    return self._do_load(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ConfigSchema(many=False)>, data = None

    def _do_load(
        self,
        data: (
            typing.Mapping[str, typing.Any]
            | typing.Iterable[typing.Mapping[str, typing.Any]]
        ),
        *,
        many: bool | None = None,
        partial: bool | types.StrSequenceOrSet | None = None,
        unknown: str | None = None,
        postprocess: bool = True,
    ):
        """Deserialize `data`, returning the deserialized result.
        This method is private API.

        :param data: The data to deserialize.
        :param many: Whether to deserialize `data` as a collection. If `None`, the
            value for `self.many` is used.
        :param partial: Whether to validate required fields. If its
            value is an iterable, only fields listed in that iterable will be
            ignored will be allowed missing. If `True`, all fields will be allowed missing.
            If `None`, the value for `self.partial` is used.
        :param unknown: Whether to exclude, include, or raise an error for unknown
            fields in the data. Use `EXCLUDE`, `INCLUDE` or `RAISE`.
            If `None`, the value for `self.unknown` is used.
        :param postprocess: Whether to run post_load methods..
        :return: Deserialized data
        """
        error_store = ErrorStore()
        errors = {}  # type: dict[str, list[str]]
        many = self.many if many is None else bool(many)
        unknown = (
            self.unknown
            if unknown is None
            else validate_unknown_parameter_value(unknown)
        )
        if partial is None:
            partial = self.partial
        # Run preprocessors
        if self._has_processors(PRE_LOAD):
            try:
                processed_data = self._invoke_load_processors(
                    PRE_LOAD, data, many=many, original_data=data, partial=partial
                )
            except ValidationError as err:
                errors = err.normalized_messages()
                result = None  # type: list | dict | None
        else:
            processed_data = data
        if not errors:
            # Deserialize data
            result = self._deserialize(
                processed_data,
                error_store=error_store,
                many=many,
                partial=partial,
                unknown=unknown,
            )
            # Run field-level validation
            self._invoke_field_validators(
                error_store=error_store, data=result, many=many
            )
            # Run schema-level validation
            if self._has_processors(VALIDATES_SCHEMA):
                field_errors = bool(error_store.errors)
                self._invoke_schema_validators(
                    error_store=error_store,
                    pass_many=True,
                    data=result,
                    original_data=data,
                    many=many,
                    partial=partial,
                    field_errors=field_errors,
                )
                self._invoke_schema_validators(
                    error_store=error_store,
                    pass_many=False,
                    data=result,
                    original_data=data,
                    many=many,
                    partial=partial,
                    field_errors=field_errors,
                )
            errors = error_store.errors
            # Run post processors
            if not errors and postprocess and self._has_processors(POST_LOAD):
                try:
                    result = self._invoke_load_processors(
                        POST_LOAD,
                        result,
                        many=many,
                        original_data=data,
                        partial=partial,
                    )
                except ValidationError as err:
                    errors = err.normalized_messages()
        if errors:
            exc = ValidationError(errors, data=data, valid_data=result)
            self.handle_error(exc, data, many=many, partial=partial)
>           raise exc
E           marshmallow.exceptions.ValidationError: {'_schema': ['Invalid input type.']}

Stack:

dataclasses-json==0.5.7
marshmallow==3.16.0
marshmallow-enum==1.5.1
pyyaml==6.0
yamldataclassconfig==1.5.0

It looks like Marchmallow's Schema.load expects data parameter to always be a dictionary 1. Loading empty file yields None which is passed further and results in what can be seen above.

If you want I may prepare a fix for this, just confirm that Expected result is really expected and it makes sense to correct that, please.

yukihiko-shinoda commented 2 years ago

Sorry to late reply, I checked your test code. There are no specification in my idea for this case because I didn't suppose.

IMO, I prefer to stop execution if anything unexpected occurs. Although, it may be scope of this package if this behavior causes inconvenient for definition of configuration file.

Note that I confirmed that following test code could pass:

# pytest-compatible file
import pathlib
from dataclasses import dataclass

from yamldataclassconfig import YamlDataClassConfig

def test_loads_empty_file(tmp_path: pathlib.Path):
    @dataclass
    class Config(YamlDataClassConfig):
        pass

    cfg_path = tmp_path / 'config.yml'
    cfg_path.touch()
+   cfg_path.write_text("{}")

    Config().load(cfg_path)
robin92 commented 2 years ago

Hmm, is {} a valid YAML? I supposed that an empty file (touched only) is but dunno about {}.

yukihiko-shinoda commented 2 years ago

This is Flow Style which is useful to define empty dictionary or sequence: YAML Ain’t Markup Language (YAML™) revision 1.2.2