Support initializing a smaller dense model from weights of a larger model

Is your feature request related to a problem? Please describe. Currently while we can extract a subnetwork from a supernetwork, given two models a larger one and a smaller one, we cannot initialize the smaller model from weights of a larger model ie. copy. Check this test utils function for a simple example https://github.com/whittle-org/whittle/blob/87de1d0e6d1d4a31c1a267eb1cb10951ce9eb4f7/test/test_api.py#L21. If would be nice to support this functionality.

Describe the solution you'd like A simple function (perhaps using parts of extract_sub_network) which makes this possible for any two input models. Please add a test for pythia and llama-3.2 extraction from llama-3.1.

Additional context This is very useful if one wants to simply extract a network and use it for knowledge distillation.

whittle-org / whittle

Support initializing a smaller dense model from weights of a larger model #177