[WIP] Gradual Unfreezing to mitigate catastrophic forgetting

ethanreidel commented 8 months ago

Adds the ability to gradually unfreeze or thaw specific layers within a pre-trained model's architecture. Aims to mitigate catastrophic forgetting/improve transfer learning capabilities. Currently works for ECD architecture.

User passes in two things: thaw_epochs (list of integers) and layers_to_thaw (2D array of layer strings)

thaw_epochs: -1 -2 layers_to_thaw:

["features.0", "features.1"] (thaws these layers (weights+biases) at epoch 1)
["features.2", "features.3"] (epoch 2) (keep in mind "features.0" will thaw all layers with the prefix "features.0" e.g. "features.0.1/2/3")

TODO/potential issues:

potentially change config syntax
users currently need to know the exact strings in architecture for thawing which is inconvenient
unittest iffy

test: [tests/ludwig/modules/test_gradual_unfreezing.py]

Any and all feedback is greatly appreciated. 👍

github-actions[bot] commented 8 months ago

Unit Test Results

      6 files ±      0       6 suites ±0 52m 7s :stopwatch: + 22m 2s 2 990 tests -       3 2 966 :heavy_check_mark: -     15 23 :zzz: +11 1 :x: +1 8 970 runs +5 941 8 898 :heavy_check_mark: +5 893 69 :zzz: +45 3 :x: +3

For more details on these failures, see this check.

Results for commit d2ba5cb0. ± Comparison against base commit 606c732a.

ethanreidel commented 8 months ago

@skanjila @saad-palapa

ludwig-ai / ludwig

[WIP] Gradual Unfreezing to mitigate catastrophic forgetting #3967

Unit Test Results