whittle-org / whittle

https://whittle-org.github.io/whittle/latest/
Apache License 2.0
9 stars 1 forks source link

Supernets with varying block sizes #137

Open gabikadlecova opened 1 hour ago

gabikadlecova commented 1 hour ago

tl;dr support supernets with varying block sizes (intermediate_size and n_heads can be lists)

Is your feature request related to a problem? Please describe. When converting HW-GPT-Bench subnets to whittle, you need to first initialize the supernet, and then call set_sub_network to set the correct intermediate sizes and n_heads.

This means that if we use only the subnetwork, we get a lot of unused weights. Ideally, the subnet should be initialized "as a supernet", i.e. setting n_embd to the smaller embedding size.

While we can set the embedding size, we currently cannot work with intermediate_size as a list due to how config.intermediate_size is used inside the MLPs: https://github.com/whittle-org/whittle/blob/3b18ba58a60ed0266438b67b0cfc272a291c7cb9/whittle/models/gpt/blocks/mlp.py#L14

The same applies for n_head (currently, it is the same for all blocks).

Describe the solution you'd like Enable lists of ints for config.intermediate_size and config.n_head

gabikadlecova commented 1 hour ago

I have a fix ready, I just want to write a test validating that the supernet produces the same output as a subnet created via set_sub_network.