unfreezing 🥶 weights callback

When finetuning a predefined image network on a downstream task, one often wants to freeze some weights for a given number of epochs/steps. As this is relatively common, we should offer a predefined callback ("cb.freeze") to do enable this.

The callback should be able to iteratively unfreeze layers after a given number of epochs / batches.

Background:

Each torch module represents its parameters as a named list():

net = torch::nn_linear(1, 1)

net$parameters
#> $weight
#> torch_tensor
#>  0.3468
#> [ CPUFloatType{1,1} ][ requires_grad = TRUE ]
#> 
#> $bias
#> torch_tensor
#>  0.6796
#> [ CPUFloatType{1} ][ requires_grad = TRUE ]

When we want to unfreeze a specific weight, we can refer to it via its name in this list. Further, we can freeze a parameter in a network by setting its $requires_grad field to FALSE:

net$parameters[[1]]$requires_grad
#> TRUE
net$parameters[[1]]$requires_grad_(FALSE)
net$parameters[[1]]$requires_grad
#> FALSE

We can unfreeze a parameter the same way:

net$parameters[[1]]$requires_grad_(TRUE)
net$parameters[[1]]$requires_grad
#> TRUE

The callback needs to define

when to unfreeze which layer (the when should be definable both in terms of epochs and batches). It should e.g. be possible to unfreeze layer8 after the first epoch, layer7 after the third, and the rest after the third epoch.
which weights are freezed at the start / which weights are trainable from the start.

I can e.g. imagine this callback to have the parameters:

start :: ASelector (see the affect_columns parameter in mlr3pipelines. that defines which weights will be trained from the start (maybe a better name exists for the parameter).
unfreeze: a data.table() with column weights (a list() column containing Selector) and a column epoch OR batch. If we had something like:
```
unfreeze = data.table(
epoch = c(1, 2)
weights = list(selector_name("some_layer"), selector_invert(selector_name("last_layer")))
)
```
this should be interpreted as unfreezing the module$parameters$some_layer after the first epoch and the rest after the second layer. If the name in the data.table is "batch" instead of "epoch", this should work just the same but after n batches instead of epochs

mlr-org / mlr3torch

unfreezing 🥶 weights callback #297