microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.48k stars 4.12k forks source link

[BUG] how to parse the parameters of the respective section using hydra and argparse? #5357

Open qwerfdsadad opened 7 months ago

qwerfdsadad commented 7 months ago

I'm training unet with deepspeed.

The basic trainging code parses parameters through hydra. The simplified code is

@hydra.main(version_base="1.2", config_path="config", config_name="config_rdb")
def main(cfg:DictConfig):
    print(OmegaConf.to_yaml(cfg))
    run_training_Unet(
         if_training=cfg.args.if_training,
         epochs=cfg.args.epochs,
        )

if __name__=="__main__":
    main()
python train.py +args=config.yaml ++args.modename='unet'

I change the basica code to deepseep version.

@hydra.main(version_base="1.2", config_path="config", config_name="config_rdb")
def main(cfg:DictConfig, args,):
    print(OmegaConf.to_yaml(cfg))
    run_training_Unet(
         if_training=cfg.args.if_training,
         epochs=cfg.args.epochs,
         args=args
     )

if __name__=="__main__":
    parser=argparse.ArgumentParser(description='deepspeed')
    parser.add_argument('--deepspeed_config',type=str,default=None,
                           help='Path')
    parser.add_argument('--local_rank',type=int,default=None,
                            help='local rank for deepspeed')
    parser.add_argument('--global_rank', type=int, default=None, 
                    help='global rank')
    args=parser.parse_args()
    main(args)
deepspeed train.py +args=config.yaml ++args.modename='unet' --deepspeed --deepspeed_config ds_config.json

But i got a error.

[2024-03-30 18:41:09,000] [ERROR] [launch.py:184:sigkill_handler] ['/public/home/y/bin/python', '-u', 'train.py', '--local_rank=0', '+args=config.yaml', '++args.model_name=Unet', '--deepspeed', '--deepspeed_config', 'ds_config.json'] exits with return code = 2
usage: train.py [--help] [--hydra-help] [--version]
                                         [--cfg {job,hydra,all}] [--resolve]
                                         [--package PACKAGE] [--run]
                                         [--multirun] [--shell-completion]
                                         [--config-path CONFIG_PATH]
                                         [--config-name CONFIG_NAME]
                                         [--config-dir CONFIG_DIR]
                                         [--experimental-rerun EXPERIMENTAL_RERUN]
                                         [--info [{all,config,defaults,defaults-tree,plugins,searchpath}]]
                                         [overrides ...]
train_models_forward_deepspeed.py: error: unrecognized arguments: --local_rank=0

I think it's the conflict of hydra and argparse parsing parameters. Is there a tutorial code?

hertz-pj commented 5 months ago

Have you been able to resolve this issue? If so, could you please share your solution or any insights you've gained? Thanks!

qwerfdsadad commented 4 months ago

hertz I add a parameter to get address of the yaml. Then get the json config through the address parameter.