Clean up display of help message

pfrwilson commented 1 year ago

Is your feature request related to a problem? Please describe. I find the default help message to be a little cluttered. See, for example, here:

 $ python train.py --model_name debug_gpt_rel_pos_v1 --lr 1e-3 --batch_size 64  --chunk_length 800 --nowandb --dataset bible -h
usage: train.py [-h] [--lr float] [--batch_size int] [--n_epochs int] [--model {premade}]
                [--name str] [--model_path [str]] [--vocab_path str] [--wandb bool]
                [--debug bool] [--dataset {openwebtext,bible}]
                [--id {lstm_small,lstm_med,lstm_large,debug_gpt,debug_gpt_rel_pos_v0,debug_gpt_rel_pos_v1,debug_gpt_v1,gpt_v0,gpt_v1}]
                [--chunk_length int]

options:
  -h, --help            show this help message and exit

Config ['config']:
  Config(lr: float = 0.001, batch_size: int = 256, n_epochs: int = 100, model: torchzero.utils.registry.Registry.BaseConfig = <factory>, name: str = <factory>, model_path: str | None = None, vocab_path: str = 'data/vocab_1024.json', wandb: bool = True, debug: bool = False, dataset: torchzero.utils.registry.Registry.BaseConfig = <factory>)

  --lr float            (default: 0.001)
  --batch_size int      (default: 256)
  --n_epochs int        (default: 100)
  --model {premade}     (default: premade)
  --name str            (default: 2023-08-11_gay-chihuahua)
  --model_path [str]    optional path to a model to load (default: None)
  --vocab_path str      optional path to a vocab to load (default: data/vocab_1024.json)
  --wandb bool, --nowandb bool
                        (default: True)
  --debug bool, --nodebug bool
                        (default: False)
  --dataset {openwebtext,bible}
                        (default: openwebtext)

Registry.Premade.<locals>.Premade ['config.model']:
  Premade(id: Literal['lstm_small', 'lstm_med', 'lstm_large', 'debug_gpt', 'debug_gpt_rel_pos_v0', 'debug_gpt_rel_pos_v1', 'debug_gpt_v1', 'gpt_v0', 'gpt_v1'] = 'lstm_small')

  --id {lstm_small,lstm_med,lstm_large,debug_gpt,debug_gpt_rel_pos_v0,debug_gpt_rel_pos_v1,debug_gpt_v1,gpt_v0,gpt_v1}
                        (default: lstm_small)

Bible.Config ['config.dataset']:
  Config(chunk_length: int = 500)

  --chunk_length int    (default: 500)

In particular, displaying the repr of the config class and nested config subgroups significantly clutters the display and, to me, doesn't provide much extra help when understanding the arguments. It's especially true when there are many argument subgroups, and even nested argument subgroups...

Describe the solution you'd like I think having the option to display it more like this:

usage: train.py [-h] [--lr float] [--batch_size int] [--n_epochs int] [--model {premade}]
                [--name str] [--model_path [str]] [--vocab_path str] [--wandb bool]
                [--debug bool] [--dataset {openwebtext,bible}]
                [--id {lstm_small,lstm_med,lstm_large,debug_gpt,debug_gpt_rel_pos_v0,debug_gpt_rel_pos_v1,debug_gpt_v1,gpt_v0,gpt_v1}]
                [--chunk_length int]

options:
  -h, --help            show this help message and exit

Config: 

  --lr float            (default: 0.001)
  --batch_size int      (default: 256)
  --n_epochs int        (default: 100)
  --model {premade}     (default: premade)
  --name str            (default: 2023-08-11_gay-chihuahua)
  --model_path [str]    optional path to a model to load (default: None)
  --vocab_path str      optional path to a vocab to load (default: data/vocab_1024.json)
  --wandb bool, --nowandb bool
                        (default: True)
  --debug bool, --nodebug bool
                        (default: False)
  --dataset {openwebtext,bible}
                        (default: openwebtext)

Model (config.model)
  --id {lstm_small,lstm_med,lstm_large,debug_gpt,debug_gpt_rel_pos_v0,debug_gpt_rel_pos_v1,debug_gpt_v1,gpt_v0,gpt_v1}
                        (default: lstm_small)

Dataset (config.dataset)
  --chunk_length int    (default: 500)

And having the option to provide a "Name" to the subgroup (just like arparse.ArgumentParser.add_argument_group(...)) Would potentially allow for a cleaner looking display.

Thanks so much for considering this request! If there is already some documentation covering this, I apologize in advance.

P.S. this project is absolutely brilliant and has made my life at least 250% better overall.

lebrice commented 1 year ago

Hello there @pfrwilson ! Thanks for posting!

The __doc__ of the dataclass is used to generate the group help text, and this is the auto-generated one from dataclasses.dataclass. Adding a docstring is a work-around, however you're right, there should probably be a help argument to parser.add_arguments that would override this. Good idea! :)

P.S. this project is absolutely brilliant and has made my life at least 250% better overall.

Haha happy you like it! Would you be interested in contributing a PR for this? I would gladly give you some pointers if you're interested, but no pressure! :)

pfrwilson commented 10 months ago

Hi @lebrice , sorry for the late reply. After a few months of use with your suggestion (using dataclass docstrings to make more user-friendly config messages) I think it's perfectly satisfactory. It would be nice to see some more advanced formatting options (imagine if we had rich-formatted help messages!) but sadly I don't have the bandwidth to help with this right now, although i'll definitely keep it in mind and may reach out at some point again about it when I have more bandwidth.

Thanks again!

lebrice / SimpleParsing

Clean up display of help message #286