lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
384 stars 46 forks source link

Simple parsing silently converts types #227

Closed apsdehal closed 1 year ago

apsdehal commented 1 year ago

Simple parsing doesn't really validate some of the dataclass types and silently converts them to the type mentioned in the config. This becomes an issue in case user passes float where the type is an int. Simple parsing should validate the config types and throw an error if the type is not same instead of silently converting it. Alternatively, throw a warning that it will type convert a particular config key.

To Reproduce

from dataclasses import dataclass
from simple_parsing import Serializable
from typing import List, Optional
import yaml

@dataclass
class Hparams:
    use_log: int = 1
    severity: int = 2
    probs: Optional[List[int]] = None

@dataclass
class Parameters(Serializable):
    hparams: Hparams = Hparams()

if __name__ == "__main__":
    with open("conf.yaml", "w") as f: f.write("""
hparams:
    use_log: 1
    severity: 0.1
    probs: [0.1, 0.2]
""")
    file_config = Parameters.load("conf.yaml", load_fn=yaml.safe_load)
    print(file_config.hparams.severity, file_config.hparams.probs)

Expected behavior This prints, 0 [0, 0] which is unexpected as it just type converted the float values and can have drastic effect.

Actual behavior Should throw an error or a warning.

Desktop (please complete the following information):

stas00 commented 1 year ago

Should throw an error or a warning.

assert please!

lebrice commented 1 year ago

Thanks for posting @apsdehal !

Yeah I'd agree with the need for a warning in this case. I dont' think it's a good idea to raise an error, since changing the field type from float to int might be a good idea in some cases during development on a project, and we'd like the user to be able to keep using the same yaml files from their previous runs.

lebrice commented 1 year ago

Hey @apsdehal , I've been thinking about this for a while, I'm not convinced anymore that this requires an explicit change in SimpleParsing. IMO, the type of the field annotations not aligning with the serialized data is an issue with user code that we shouldn't try to account for in SP.

With that said, it should still be possible for the user to customize how fields are decoded. I made #232 to address this. For example, here's how you can get what you want:

(note, you'd import the following):

from simple_parsing.helpers.serialization.decoding import (
    get_decoding_fn,
    register_decoding_fn,
)

https://github.com/lebrice/SimpleParsing/blob/51b38de0c803d56b1fb8f0b4fc3e1e0272ed79dc/test/test_decoding.py#L148-L161

If you want to save dataclasses without losing any information (e.g. subgroup choice, or field types, etc), then I'd recommend serializing with pyyaml directly, rather than using the save and load functions from simple-parsing.

You might also want to check out something like https://github.com/yukinarit/pyserde or https://docs.pydantic.dev/ if you want more "serious" field validation.

Hope this helps, let me know what you think! :)

stas00 commented 1 year ago

@apsdehal, fwiw, deepspeed recently integrated pydantic (but haven't fully activated it yet) (if you want a vote of confidence)

apsdehal commented 1 year ago

This makes a lot of sense @lebrice. Thanks for looking into it.