konradhalas / dacite

Simple creation of data classes from dictionaries.
MIT License
1.76k stars 106 forks source link

Improve performance of get type hints #107

Closed uri-p closed 1 year ago

uri-p commented 4 years ago

Calling get_type_hints consume a lot of CPU time. When calling the from_dict multiple times with the same Class type this optimization can save around 50% of the runtime.

Suggestion for the change:

` @lru_cache(maxsize=100) def _get_type_hint_with_cache(_data_class): return get_type_hints(_data_class, None)

def from_dict(data_class: Type[T], data: Data, config: Optional[Config] = None) -> T:

init_values: Data = {}
post_init_values: Data = {}
config = config or Config()
try:
    if config.forward_references:
        data_class_hints = get_type_hints(data_class, globalns=config.forward_references)
    else:
        data_class_hints = _get_type_hint_with_cache(data_class)

except NameError as error:
    raise ForwardReferenceError(str(error))
data_class_fields = get_fields(data_class)
if config.strict:
    extra_fields = set(data.keys()) - {f.name for f in data_class_fields}
    if extra_fields:
        raise UnexpectedDataError(keys=extra_fields)
for field in data_class_fields:
    field = copy.copy(field)
    field.type = data_class_hints[field.name]
    try:
        try:
            field_data = data[field.name]
            transformed_value = transform_value(
                type_hooks=config.type_hooks, cast=config.cast, target_type=field.type, value=field_data
            )
            value = _build_value(type_=field.type, data=transformed_value, config=config)
        except DaciteFieldError as error:
            error.update_path(field.name)
            raise
        if config.check_types and not is_instance(value, field.type):
            raise WrongTypeError(field_path=field.name, field_type=field.type, value=value)
    except KeyError:
        try:
            value = get_default_value_for_field(field)
        except DefaultValueNotFoundError:
            if not field.init:
                continue
            raise MissingValueError(field.name)
    if field.init:
        init_values[field.name] = value
    else:
        post_init_values[field.name] = value

return create_instance(data_class=data_class, init_values=init_values, post_init_values=post_init_values)

`

konradhalas commented 4 years ago

Hi @uri-p - thank you for reporting this issue.

Please check this branch - https://github.com/konradhalas/dacite/tree/feature/performance-improvements - it has many performance improvements (eg. cache for get_type_hints and many others calls - check this commit - https://github.com/konradhalas/dacite/commit/7e1a9227c7bd0084aa198d925dfafca04b10efe0). I hope that I will merge it soon, but it still needs some adjustments.

uri-p commented 4 years ago

Thank you for the quick answer, I checked your branch and it looks exactly like some of the improvements I wanted to add. I use dacite heavily and those improvements are super important.

Hope you will push it soon :-)

konradhalas commented 4 years ago

Cool :) Please notice that you can share Cache object between multiple from_dict calls, so it's nice performance improvement if you want to build many objects of a single dataclass.

uri-p commented 4 years ago

Very useful !

marcoacierno commented 3 years ago

Are there any news regarding this? :)

harrylojames commented 1 year ago

Any news? "share Cache object between multiple from_dict calls, so it's nice performance improvement if you want to build many objects of a single dataclass." This would be great!

konradhalas commented 1 year ago

@mciszczon introduced cache feature via https://github.com/konradhalas/dacite/commit/3c0e180a1e5295ca4b2eb023b492796939f18bf5