tl;dr support supernets with varying block sizes (intermediate_size and n_heads can be lists)
Is your feature request related to a problem? Please describe.
When converting HW-GPT-Bench subnets to whittle, you need to first initialize the supernet, and then call set_sub_network to set the correct intermediate sizes and n_heads.
This means that if we use only the subnetwork, we get a lot of unused weights. Ideally, the subnet should be initialized "as a supernet", i.e. setting n_embd to the smaller embedding size.
tl;dr support supernets with varying block sizes (
intermediate_size
andn_heads
can be lists)Is your feature request related to a problem? Please describe. When converting HW-GPT-Bench subnets to whittle, you need to first initialize the supernet, and then call
set_sub_network
to set the correct intermediate sizes and n_heads.This means that if we use only the subnetwork, we get a lot of unused weights. Ideally, the subnet should be initialized "as a supernet", i.e. setting
n_embd
to the smaller embedding size.While we can set the embedding size, we currently cannot work with
intermediate_size
as a list due to howconfig.intermediate_size
is used inside the MLPs: https://github.com/whittle-org/whittle/blob/3b18ba58a60ed0266438b67b0cfc272a291c7cb9/whittle/models/gpt/blocks/mlp.py#L14The same applies for n_head (currently, it is the same for all blocks).
Describe the solution you'd like Enable lists of ints for
config.intermediate_size
andconfig.n_head