Question about the SubTransformers sampling process.

Hi,

Thanks a lot for releasing this great project. I have a question on the SubTransformers sampling process in the distributed training environment. I see you sample a random SubTransformer before each train step by doing the following, then in multi-GPU scenario, does each GPU has the same random SubTransformer or they each has a different random Subnetwork? Would reset_rand_seed force all GPUs to sample the same random SubTransformer from the SuperNet? And is trainer.get_num_updates() the same at each train step?

configs = [utils.sample_configs(utils.get_all_choices(args), reset_rand_seed=True, rand_seed=trainer.get_num_updates(), super_decoder_num_layer=args.decoder_layers)]

Thanks a lot for your help.

mit-han-lab / hardware-aware-transformers

Question about the SubTransformers sampling process. #17