Closed stas00 closed 1 year ago
Hello @stas00 , thank you very much for posting this!
There was recently an imporvement to the decoding of dataclasses in #217. These warnings shouldn't be an issue anymore, when the type is a dataclass.
The goal of this warning is to tell the user when a field is of a type that we don't know how to decode. For example, say you have a class with a field a: Foo
, then the warning should be raised when deserializing the containing class from a file or a dictionary, as we're not sure how to convert a serialized value (e.g. a string) into a Foo
object.
In the case of dataclasses, previously, the "fix" was to mark all the dataclasses as Serializable, which is sub-optimal. Now with #217 , (which I'll include soon in a new release), these warnings shouldn't happen for dataclass types.
As for more examples and better documentation you're absolutely right. There is a need for at least one example showing how to register custom encoding and decoding functions for special field types.
Thanks again @stas00 , and let me know what you think.
Thank you for the follow up, Fabrice!
... we don't know how to decode ...
we users don't know what that means. To help users interpret the warnings/errors you need to define somewhere what do decode
and encode
mean as it could be anything depending on the context.
In this case I think this can be defined as:
so basically encode is a sort of repr
suitable for serialization, which should have a counterpart decode method/function that can restore the serialized string back into the object. Perhaps it can be depicted as obj -> encode -> string -> decode -> obj
or:
obj == decode(encode(obj))
may be?
So in the case of pathlib.Path
str
Path
Am I correct?
so that:
p = Path("/tmp")
encoded = str(p)
decoded = Path(encoded)
p == decoded # True
# or
p == Path(str(p)) # True
If this resonates this could be a start of the section on the encode/decode functions and just adding a few more examples.
So for many objects, the decode function is the normal constructor of the object.
Yes @stas00 , that's right, sorry for the confusion!
One small note: it's actually more like this:
obj -> encode -> "raw" values (dicts, strings, ints, etc) -> json/yaml/etc.dumps -> string
string -> json/yaml/etc.load -> "raw" python values (dicts, strings, ints, etc) -> decode -> obj
The idea is that encode
and decode
are meant to serialize and deserialize a value of a dataclass field into a "raw python value" that another library (eg json
/ pyyaml
) can then use.
Would you mind trying again with the newest release (v0.1.0) and letting me know if the amount of warnings has indeed decreased on your end?
Also, unrelated to your question, but you may want to checkout https://github.com/yukinarit/pyserde, I might possibly offload all the serialization / deserialization stuff over to that library at some point.
That's perfect, could your comment be saved as a doc? I think this is loud and clear now to help the user understand what is meant by encode/decode in this context.
Would you mind trying again with the newest release (v0.1.0) and letting me know if the amount of warnings has indeed decreased on your end?
I did. All the warnings are gone, but the code breaks now. Was there some backward compatibility change?
Enum member fails to get decoded.
Here is the repro:
from dataclasses import dataclass
from pathlib import Path
from simple_parsing import ArgumentParser, Serializable
from simple_parsing.helpers import dict_field, list_field
from simple_parsing.helpers.serialization import register_decoding_fn
from typing import Any, Dict, List, Optional
import yaml
from enum import Enum
class LoggingTypes(Enum):
JSONL = "jsonl"
PRINT = "print"
@dataclass
class Hparams:
seed: int = 13
xyz: List[LoggingTypes] = list_field()
@dataclass
class Parameters(Serializable):
hparams: Hparams = Hparams()
p: Optional[Path] = None
if __name__ == "__main__":
with open("conf.yaml", "w") as f: f.write("""
p: /tmp
hparams:
xyz:
- jsonl
""")
file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)
with simple-parsing==0.0.20
I get:
python config.py
/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py:248: UserWarning: Unable to find a decoding function for the annotation <class '__main__.Hparams'> (of type <class 'type'>). Will try to use the type as a constructor. Consider registering a decoding function using `register_decoding_fn`, or posting an issue on GitHub.
warnings.warn(
/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py:248: UserWarning: Unable to find a decoding function for the annotation <class 'pathlib.Path'> (of type <class 'type'>). Will try to use the type as a constructor. Consider registering a decoding function using `register_decoding_fn`, or posting an issue on GitHub.
warnings.warn(
with simple_parsing==0.1.0
both warnings are gone, yay! But the code crashes now:
Traceback (most recent call last):
File "config.py", line 35, in <module>
file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 186, in load
return load(cls, path=path, drop_extra_fields=drop_extra_fields, load_fn=load_fn, **kwargs)
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 405, in load
return from_dict(cls, d, drop_extra_fields=drop_extra_fields)
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 739, in from_dict
field_value = decode_field(field, raw_value, containing_dataclass=cls)
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 87, in decode_field
return get_decoding_fn(field_type)(raw_value)
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/serializable.py", line 739, in from_dict
field_value = decode_field(field, raw_value, containing_dataclass=cls)
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 87, in decode_field
return get_decoding_fn(field_type)(raw_value)
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 280, in _decode_list
return [decode_item(v) for v in val]
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 280, in <listcomp>
return [decode_item(v) for v in val]
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/site-packages/simple_parsing/helpers/serialization/decoding.py", line 380, in _decode_enum
return item_type[val]
File "/home/stas/anaconda3/envs/py38-pt113/lib/python3.8/enum.py", line 387, in __getitem__
return cls._member_map_[name]
KeyError: 'jsonl'
Actually we had this exact problem with simple-parsing==0.0.20
, when I made Hparams
class inherit from Serializable
trying to overcome the warning. So you can see the connection here.
Thank you, Fabrice.
Hmmm interesting. Thanks for pointing it out, I'll take a look at this tomorrow.
Ah, I found the issue @stas00. We save enums by name. Enum types are encoded here: https://github.com/lebrice/SimpleParsing/blob/10495b71f3c974ce76eb01fd4bc18deeb99e2c4f/simple_parsing/helpers/serialization/encoding.py#L139-L141
And decoded here:
The yaml file you're using here has the enum saved by value (jsonl
) which, since it's a string enum here, can be a bit confusing.
If you actually want to save that enum by value instead of by name (JSONL
), then you have two options:
from simple_parsing.helpers.serialization.encoding import encode
from simple_parsing.helpers.serialization.decoding import register_decoding_fn
register_decoding_fn(LoggingTypes, LoggingTypes)
encode.register(LoggingTypes, lambda x: x.value)
encoding_fn
and decoding_fn
to simple_parsing.helpers.field
:@dataclass
class HparamsWithField:
seed: int = 13
xyz: List[LoggingTypes] = field(
encoding_fn=lambda x: [e.value for e in x],
decoding_fn=lambda str_list: [LoggingTypes(e) for e in str_list],
default_factory=list,
)
@dataclass
class ParametersWithField(Serializable):
hparams: HparamsWithField = field(default_factory=HparamsWithField)
p: Optional[Path] = None
If you're using this Enum in multiple places, I'd go with option 1.
Hope this helps! Thanks again for posting! :)
Adding on to the available options on top of the ones suggested by @lebrice.
In case your enum is pretty simple and both name and values are same, you can get away with using typings.Literal
. Note that this requires Python 3.8+ with typing installed.
Here is the fix to the repro:
from dataclasses import dataclass
from pathlib import Path
from simple_parsing import ArgumentParser, Serializable
from simple_parsing.helpers import dict_field, list_field
from simple_parsing.helpers.serialization import register_decoding_fn
from typing import Any, Dict, List, Literal, Optional
import yaml
LoggingTypes = Literal["jsonl", "print"]
@dataclass
class Hparams:
seed: int = 13
xyz: List[LoggingTypes] = list_field()
@dataclass
class Parameters(Serializable):
hparams: Hparams = Hparams()
p: Optional[Path] = None
if __name__ == "__main__":
with open("conf.yaml", "w") as f: f.write("""
p: /tmp
hparams:
xyz:
- jsonl
""")
file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)
Thank you very much for figuring out the cause and the proposed solutions, @lebrice
As our case is simple enums, we switched from Enum
's to Aman's Literal
approach from above.
and we validated that there are no warnings with v0.1 using his approach
Thanks again for resolving these needs, Fabrice.
Describe the bug
For each dataclass I use I get a warning like:
To Reproduce
I reduced the problem to just this:
Run:
It'd be awesome to receive an advice on how to resolve this warning as we have a whole slew of those.
I think it has to be one of:
may be
dataclasses.asdict
?Additionally a few issues that you might kindly consider:
I think this library surely should know how to decode a
dataclass
type of class, since it's used everywhere in the documentation and there is not a singleregister_decoding_fn
used for such classes. i.e. is this a documentation issue, or the authors don't mind the warnings?I dug a bit into
dataclasses
and the check is:so auto-recognizing
dataclass
objects should be very helpful, no?I'd also like to highlight that this warnings isn't very actionable as the user doesn't know what
<class 'type'>
means or if they do how to act on it. Surely, it comes fromtype(Hparams)
- but how does it help to the user to fix the problem? i.e. would it be possible to guide the user from the problem to the solution w/o needing to file an issue? Perhaps writing a small section with examples? I diligently spent an hour reading your docs and a bit the code but I am still not sure if I'm doing it right.The real complex program (not the repro) works despite those warnings. Is it possible that these warnings are misplaced and shouldn't be emitted in the first place?
If it's a problem please assert, if it is not a problem then why emit a warning? Perhaps it'd help to understand the situation better with an example where
type
as a fallback doesn't work. But please feel free to ignore this curiosity question and let's focus on the actual solving of the problem first if this doesn't resonate.now let's go to a non-dataclass classes:
The real program also emitted this warning about
pathlib
(not in the repro example), and this code removed the warningBut I'm not at all sure if I did it correctly, does it want a constructor?
str
would be more of a encoding_fn - does it want:I think it'd also help the user to have the context, what are you decoding from - is it a pickled string to object situation? or object to string?
Again, having a section with examples of common and custom classes would be very helpful and the warning could send the user to that document.
Thank you!
Desktop (please complete the following information):