lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
384 stars 46 forks source link

How to get rid of a warning for each used `dataclass` #219

Closed stas00 closed 1 year ago

stas00 commented 1 year ago

Describe the bug

For each dataclass I use I get a warning like:

.../python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py:249: UserWarning: Unable to find a decoding function for the annotation <class 'main.Hparams'> (of type <class 'type'>). Will try to use the type as a constructor. Consider registering a decoding function using register_decoding_fn, or posting an issue on GitHub. warnings.warn(

To Reproduce

I reduced the problem to just this:

from dataclasses import dataclass
import yaml
from simple_parsing import ArgumentParser, Serializable

@dataclass
class Hparams:
    seed: int = 13

@dataclass
class Parameters(Serializable):
    hparams: Hparams = Hparams()

if __name__ == "__main__":
    with open("conf.yaml", "w") as f: f.write("hparams:")
    file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)

Run:

$ python config.py
/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py:249: 
UserWarning: Unable to find a decoding function for the annotation <class '__main__.Hparams'> (of type <class 'type'>). Will try to 
use the type as a constructor. Consider registering a decoding function using `register_decoding_fn`, or posting an issue on GitHub.
  warnings.warn(

It'd be awesome to receive an advice on how to resolve this warning as we have a whole slew of those.

I think it has to be one of:

register_decoding_fn(Hparams, type)
register_decoding_fn(Hparams, Hparams)

may be dataclasses.asdict?

Additionally a few issues that you might kindly consider:

  1. I think this library surely should know how to decode a dataclass type of class, since it's used everywhere in the documentation and there is not a single register_decoding_fn used for such classes. i.e. is this a documentation issue, or the authors don't mind the warnings?

    I dug a bit into dataclasses and the check is:

    dataclasses.is_dataclass(Hparams)

    so auto-recognizing dataclass objects should be very helpful, no?

  2. I'd also like to highlight that this warnings isn't very actionable as the user doesn't know what <class 'type'> means or if they do how to act on it. Surely, it comes from type(Hparams) - but how does it help to the user to fix the problem? i.e. would it be possible to guide the user from the problem to the solution w/o needing to file an issue? Perhaps writing a small section with examples? I diligently spent an hour reading your docs and a bit the code but I am still not sure if I'm doing it right.

  3. The real complex program (not the repro) works despite those warnings. Is it possible that these warnings are misplaced and shouldn't be emitted in the first place?

    If it's a problem please assert, if it is not a problem then why emit a warning? Perhaps it'd help to understand the situation better with an example where type as a fallback doesn't work. But please feel free to ignore this curiosity question and let's focus on the actual solving of the problem first if this doesn't resonate.


now let's go to a non-dataclass classes:

The real program also emitted this warning about pathlib (not in the repro example), and this code removed the warning

register_decoding_fn(Path, str)

But I'm not at all sure if I did it correctly, does it want a constructor? str would be more of a encoding_fn - does it want:

register_decoding_fn(Path, Path)

I think it'd also help the user to have the context, what are you decoding from - is it a pickled string to object situation? or object to string?

Again, having a section with examples of common and custom classes would be very helpful and the warning could send the user to that document.

Thank you!

Desktop (please complete the following information):

lebrice commented 1 year ago

Hello @stas00 , thank you very much for posting this!

There was recently an imporvement to the decoding of dataclasses in #217. These warnings shouldn't be an issue anymore, when the type is a dataclass.

The goal of this warning is to tell the user when a field is of a type that we don't know how to decode. For example, say you have a class with a field a: Foo, then the warning should be raised when deserializing the containing class from a file or a dictionary, as we're not sure how to convert a serialized value (e.g. a string) into a Foo object.

In the case of dataclasses, previously, the "fix" was to mark all the dataclasses as Serializable, which is sub-optimal. Now with #217 , (which I'll include soon in a new release), these warnings shouldn't happen for dataclass types.

As for more examples and better documentation you're absolutely right. There is a need for at least one example showing how to register custom encoding and decoding functions for special field types.

Thanks again @stas00 , and let me know what you think.

stas00 commented 1 year ago

Thank you for the follow up, Fabrice!

... we don't know how to decode ...

we users don't know what that means. To help users interpret the warnings/errors you need to define somewhere what do decode and encode mean as it could be anything depending on the context.

In this case I think this can be defined as:

so basically encode is a sort of repr suitable for serialization, which should have a counterpart decode method/function that can restore the serialized string back into the object. Perhaps it can be depicted as obj -> encode -> string -> decode -> obj or: obj == decode(encode(obj)) may be?

So in the case of pathlib.Path

Am I correct?

so that:

p = Path("/tmp") 
encoded = str(p)
decoded = Path(encoded)
p == decoded # True
# or
p == Path(str(p)) # True

If this resonates this could be a start of the section on the encode/decode functions and just adding a few more examples.

So for many objects, the decode function is the normal constructor of the object.

lebrice commented 1 year ago

Yes @stas00 , that's right, sorry for the confusion!

One small note: it's actually more like this:

obj -> encode -> "raw" values (dicts, strings, ints, etc) -> json/yaml/etc.dumps -> string
string -> json/yaml/etc.load -> "raw" python values (dicts, strings, ints, etc) -> decode -> obj

The idea is that encode and decode are meant to serialize and deserialize a value of a dataclass field into a "raw python value" that another library (eg json / pyyaml) can then use.

Would you mind trying again with the newest release (v0.1.0) and letting me know if the amount of warnings has indeed decreased on your end?

lebrice commented 1 year ago

Also, unrelated to your question, but you may want to checkout https://github.com/yukinarit/pyserde, I might possibly offload all the serialization / deserialization stuff over to that library at some point.

stas00 commented 1 year ago

That's perfect, could your comment be saved as a doc? I think this is loud and clear now to help the user understand what is meant by encode/decode in this context.

stas00 commented 1 year ago

Would you mind trying again with the newest release (v0.1.0) and letting me know if the amount of warnings has indeed decreased on your end?

I did. All the warnings are gone, but the code breaks now. Was there some backward compatibility change?

Enum member fails to get decoded.

Here is the repro:

from dataclasses import dataclass
from pathlib import Path
from simple_parsing import ArgumentParser, Serializable
from simple_parsing.helpers import dict_field, list_field
from simple_parsing.helpers.serialization import register_decoding_fn
from typing import Any, Dict, List, Optional
import yaml
from enum import Enum

class LoggingTypes(Enum):
    JSONL = "jsonl"
    PRINT = "print"

@dataclass
class Hparams:
    seed: int = 13
    xyz: List[LoggingTypes] = list_field()

@dataclass
class Parameters(Serializable):
    hparams: Hparams = Hparams()
    p: Optional[Path] = None

if __name__ == "__main__":
    with open("conf.yaml", "w") as f: f.write("""
p: /tmp
hparams:
    xyz:
    - jsonl
""")
    file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)

with simple-parsing==0.0.20 I get:

python config.py
/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py:248: UserWarning: Unable to find a decoding function for the annotation <class '__main__.Hparams'> (of type <class 'type'>). Will try to use the type as a constructor. Consider registering a decoding function using `register_decoding_fn`, or posting an issue on GitHub.
  warnings.warn(
/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py:248: UserWarning: Unable to find a decoding function for the annotation <class 'pathlib.Path'> (of type <class 'type'>). Will try to use the type as a constructor. Consider registering a decoding function using `register_decoding_fn`, or posting an issue on GitHub.
  warnings.warn(

with simple_parsing==0.1.0 both warnings are gone, yay! But the code crashes now:

Traceback (most recent call last):
  File "config.py", line 35, in <module>
    file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 186, in load
    return load(cls, path=path, drop_extra_fields=drop_extra_fields, load_fn=load_fn, **kwargs)
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 405, in load
    return from_dict(cls, d, drop_extra_fields=drop_extra_fields)
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 739, in from_dict
    field_value = decode_field(field, raw_value, containing_dataclass=cls)
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 87, in decode_field
    return get_decoding_fn(field_type)(raw_value)
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 739, in from_dict
    field_value = decode_field(field, raw_value, containing_dataclass=cls)
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 87, in decode_field
    return get_decoding_fn(field_type)(raw_value)
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 280, in _decode_list
    return [decode_item(v) for v in val]
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 280, in <listcomp>
    return [decode_item(v) for v in val]
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 380, in _decode_enum
    return item_type[val]
  File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/enum.py", line 387, in __getitem__
    return cls._member_map_[name]
KeyError: 'jsonl'

Actually we had this exact problem with simple-parsing==0.0.20 , when I made Hparams class inherit from Serializable trying to overcome the warning. So you can see the connection here.

Thank you, Fabrice.

lebrice commented 1 year ago

Hmmm interesting. Thanks for pointing it out, I'll take a look at this tomorrow.

lebrice commented 1 year ago

Ah, I found the issue @stas00. We save enums by name. Enum types are encoded here: https://github.com/lebrice/SimpleParsing/blob/10495b71f3c974ce76eb01fd4bc18deeb99e2c4f/simple_parsing/helpers/serialization/encoding.py#L139-L141

And decoded here:

https://github.com/lebrice/SimpleParsing/blob/10495b71f3c974ce76eb01fd4bc18deeb99e2c4f/simple_parsing/helpers/serialization/decoding.py#L368-L382

The yaml file you're using here has the enum saved by value (jsonl) which, since it's a string enum here, can be a bit confusing. If you actually want to save that enum by value instead of by name (JSONL), then you have two options:

  1. Registering functions to use for encoding / decoding your enum type:
from simple_parsing.helpers.serialization.encoding import encode
from simple_parsing.helpers.serialization.decoding import register_decoding_fn

register_decoding_fn(LoggingTypes, LoggingTypes)
encode.register(LoggingTypes, lambda x: x.value)
  1. Passing a encoding_fn and decoding_fn to simple_parsing.helpers.field:
@dataclass
class HparamsWithField:
    seed: int = 13
    xyz: List[LoggingTypes] = field(
        encoding_fn=lambda x: [e.value for e in x],
        decoding_fn=lambda str_list: [LoggingTypes(e) for e in str_list],
        default_factory=list,
    )
@dataclass
class ParametersWithField(Serializable):
    hparams: HparamsWithField = field(default_factory=HparamsWithField)
    p: Optional[Path] = None

If you're using this Enum in multiple places, I'd go with option 1.

Hope this helps! Thanks again for posting! :)

apsdehal commented 1 year ago

Adding on to the available options on top of the ones suggested by @lebrice. In case your enum is pretty simple and both name and values are same, you can get away with using typings.Literal. Note that this requires Python 3.8+ with typing installed. Here is the fix to the repro:

from dataclasses import dataclass
from pathlib import Path
from simple_parsing import ArgumentParser, Serializable
from simple_parsing.helpers import dict_field, list_field
from simple_parsing.helpers.serialization import register_decoding_fn
from typing import Any, Dict, List, Literal, Optional
import yaml

LoggingTypes = Literal["jsonl", "print"]

@dataclass
class Hparams:
    seed: int = 13
    xyz: List[LoggingTypes] = list_field()

@dataclass
class Parameters(Serializable):
    hparams: Hparams = Hparams()
    p: Optional[Path] = None

if __name__ == "__main__":
    with open("conf.yaml", "w") as f: f.write("""
p: /tmp
hparams:
    xyz:
    - jsonl
""")
    file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)
stas00 commented 1 year ago

Thank you very much for figuring out the cause and the proposed solutions, @lebrice

As our case is simple enums, we switched from Enum's to Aman's Literal approach from above.

and we validated that there are no warnings with v0.1 using his approach

Thanks again for resolving these needs, Fabrice.