Supernets with varying block sizes

tl;dr support supernets with varying block sizes (intermediate_size and n_heads can be lists)

Is your feature request related to a problem? Please describe. When converting HW-GPT-Bench subnets to whittle, you need to first initialize the supernet, and then call set_sub_network to set the correct intermediate sizes and n_heads.

This means that if we use only the subnetwork, we get a lot of unused weights. Ideally, the subnet should be initialized "as a supernet", i.e. setting n_embd to the smaller embedding size.

While we can set the embedding size, we currently cannot work with intermediate_size as a list due to how config.intermediate_size is used inside the MLPs: https://github.com/whittle-org/whittle/blob/3b18ba58a60ed0266438b67b0cfc272a291c7cb9/whittle/models/gpt/blocks/mlp.py#L14

The same applies for n_head (currently, it is the same for all blocks).

Describe the solution you'd like Enable lists of ints for config.intermediate_size and config.n_head

whittle-org / whittle

Supernets with varying block sizes #137