logix-project / logix

AI Logging for Interpretability and Explainability🔬
Apache License 2.0
74 stars 6 forks source link

Config Improvements #76

Closed sangkeun00 closed 7 months ago

sangkeun00 commented 8 months ago

I believe there are some (minor) issues that may be addressed regarding Config.

  1. Scope: There are some variables that can be configured, but are not configured currently in Config. Some examples include
    • project: I don't see any reasons, for which project should be configured separately in the main python script instead of in Config.
    • scheduler: I am less sure about this, but we can also use Config to specify scheduler arguments.
  2. Data structure: Currently, Config is mostly Python dictionary. However, we may instead use Python dataclass to have a cleaner structure. Also, it may be more natural and easier to do config.key than config["key"] for many users. The overall design may look like:
@dataclass
class Config:
    project: str = "project"
    root_dir: str = "./analog"
    logging: LoggingConfig
    lora: LoraConfig

@dataclass
class LoggingConfig:
    flush_threshold: int = -1
    num_workers: int = 1

@dataclass
class LoraConfig:
    init: str = "random"
    rank: int = 64
...

@hwijeen @pomonam @hage1005 @YoungseogChung Let me know what you think!

levmckinney commented 8 months ago

If you're planning on switching over to data classes for your configuration I'm personally a big fan of https://github.com/lebrice/SimpleParsing. It's pretty great for exposing and composing configs. Plus it just uses dataclasses.

sangkeun00 commented 8 months ago

@levmckinney Thanks for the suggestion! Would you like to work on this feature? If not, I can take a deeper look into it when I have more time (after my PhD proposal...haha).

sangkeun00 commented 8 months ago

@levmckinney I took a brief look at SimpleParsing. My impression is that it handles the configuration mostly through the interface of argparse instead of yaml, which is what we are using now. Do you have any thoughts on this?

levmckinney commented 8 months ago

@sangkeun00 Yes it focuses on argparse though it does have some facilities for serializing and de-serializing from yaml see here. We definitely don't have to use it though extra dependencies like this are always a risk if they get abandoned, which happens surprisingly often.

Thanks for the suggestion! Would you like to work on this feature?

Sure I can take a crack at this this weekend. Seems like a good way to learn the code base.

sangkeun00 commented 8 months ago

Sure! If there are any parts of the code that you don't understand, let me know anytime. I will be responsive on Discord.

We definitely don't have to use it though extra dependencies like this are always a risk if they get abandoned, which happens surprisingly often.

I fully agree with this. I think these libraries may be effective for the personal projects, but using it as a part of the library may not be the best strategy, mostly due to the dependency issues you mentioned above. Unless this makes our development significantly easier, I prefer to implement our Config from scratch.

sangkeun00 commented 7 months ago

Dataclass-based Config is implemented in PR #85 .