Closed csaroff closed 1 year ago
Unfortunately, in fastai there isn't a built-in way for a callback to know what context it's being called, other than checking if another callback exists. ProgressiveResize
can tell if you are predicting or using LRFinder via this code in ProgressiveResize.before_fit
and prevent itself from running:
if hasattr(self.learn, 'lr_finder') or hasattr(self.learn, "gather_preds"):
self.run = False
return
but it cannot tell if fine_tune
is calling it or fit_one_cycle
(or other fit
method). Neither can it tell if the frozen part of fine_tune
is calling it or the unfrozen part.
The solution is to either manually run all of the fine_tune
steps as you are doing except with two dataloaders: an initial size and full size dataloader for frozen and unfrozen, respectively, or create your own custom fine_tune
method which takes the initial and full size dataloaders and a list of unfrozen callbacks.
Makes sense. Have you experimented with this at all? Any recommendations on how best to mix progressive resizing with transfer learning?
For context, I'm using CutMixUpAugment
and ProgressiveResize
together, but it's weird that the accuracy is obliterated for the first couple of epochs
I have not. The best resources on progressive resizing are the fastai course and MosiacML's documentation, both which I link to in the fastxtend ProgressiveResize
documentation.
My guess is CutMixUpAugment
is the primary culprit. Usually, MixUp and CutMix achieve best results on longer training runs. Around 60-80 epochs on Imagenette sized dataset. I would try not applying CutMixUpAugment
to the frozen training, as there you're adapting a random new head to the existing network. I'd also try only apply CutMixUpAugment
if training longer, or use augment_finetune
to delay when CutMixUp is applied.
@warner-benjamin Based on some basic experimentation, it does seem like CutMixUpAugment is the culprit. I'll try incorporating your suggestions. I appreciate the resources and support!
Idk if it's a bug in the callback or just some behavior that I don't fully understand, but running CutMixUpAugment
with element=False
dramatically improved the early epoch performance.
You can look at the documentation to see examples of element=False
and element=True
. True mixes MixUp, CutMix, and additional Augmentations within the same batch, while False selects one of the three per batch.
Specifying
ProgressiveResize()
in the callbacks list and callinglearn.fine_tune
leads to ProgressiveResize being run for two separate training runs.If I manually call all of the fine_tune steps, I can add the callback to only the unfrozen epochs, but fine_tune will run at the
Resize
size rather than the initial size.My hypothesis is that we would see better training performance if the frozen epochs were run at the initial size. What's the simplest way to accomplish this with the callback?