openspeech-team / openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
https://openspeech-team.github.io/openspeech/
MIT License
679 stars 114 forks source link

How to set configs in hydra_train.py #126

Open JinmingChe opened 2 years ago

JinmingChe commented 2 years ago

❓ Questions & Help

Hello, I am learning how to use openspeech. And I want to set configs in python file, so I can debug easily. The recommended method is to pass parameters on the command.

Details

I try to use configs.dataset = 'librispeech' in hydra_train.py instead of python .hydra_train.py dataset=librispeech. But it gives me the following errors. omegaconf.errors.ConfigAttributeError: Key 'dataset' is not in struct full_key: dataset object_type=dic It is so kind of you to give me some advice about this usage.

OleguerCanal commented 2 years ago

I think the problem is that configs.dataset should be a dictionary and not a string. If you wanna change it on the python script, I believe you should configs.dataset.dataset="librispeech"

f you wanna do it from the command line you can do:

python hydra_train.py dataset.dataset="librispeech"
sooftware commented 2 years ago

Can you show us how you made the command? It could be a command grammar error.

JinmingChe commented 2 years ago

The followling is my command `@hydra.main(config_path=os.path.join("..", "openspeech", "configs"), config_name="train") def hydra_main(configs: DictConfig) -> None: rank_zero_info(OmegaConf.to_yaml(configs)) pl.seed_everything(configs.trainer.seed)

way 1

configs['dataset'] = 'librispeech'
# way 2
configs.dataset = 'librispeech'`

I use two ways to add configs.dataset. But they all give me the same error. Exception has occurred: ConfigKeyError Key 'dataset' is not in struct full_key: dataset object_type=dict The above exception was the direct cause of the following exception: File "/home/chenjinming/github/openspeech/openspeech_cli/hydra_train.py", line 44, in hydra_main configs['dataset'] = 'librispeech'

And I print the configs struct. {'augment': {'apply_spec_augment': False, 'apply_noise_augment': False, 'apply_joining_augment': False, 'apply_time_stretch_augment': False, 'freq_mask_para': 27, 'freq_mask_num': 2, 'time_mask_num': 4, 'noise_dataset_dir': 'None', 'noise_level': 0.7, 'time_stretch_min_rate': 0.7, 'time_stretch_max_rate': 1.4}, 'trainer': {'seed': 1, 'accelerator': 'dp', 'accumulate_grad_batches': 1, 'num_workers': 4, 'batch_size': 32, 'check_val_every_n_epoch': 1, 'gradient_clip_val': 5.0, 'logger': 'wandb', 'max_epochs': 20, 'save_checkpoint_n_steps': 10000, 'auto_scale_batch_size': 'binsearch', 'sampler': 'smart', 'name': 'gpu', 'device': 'gpu', 'use_cuda': True, 'auto_select_gpus': True}}

It seems that has no key of 'dataset'. My propose is to add a new key or change the default configs setting instead of using the command line.

resurgo97 commented 2 years ago

Did you try configs['dataset']['dataset'] = 'librispeech'?

configs['dataset'] has to be dictionary, which has as keys 'dataset', 'dataset_path', 'dataset_download', and 'manifest_file_path'.

It throws an error when you try configs['dataset'] = 'librispeech', which is string but not dict. Therefore it is removed from the configs and you don't see it when you print it.