Thanks for your work. I was wondering how do you deal with gradient updates on the non-searchable stages of the model.
The searchable layers will only be updated once, but multiple forward and backward passess would then go through the tail/stem and the detection head. Would you perhaps average the gradients ? or perhaps freeze the parameters of the non-searchable stages ?
Hi,
Thanks for your work. I was wondering how do you deal with gradient updates on the non-searchable stages of the model. The searchable layers will only be updated once, but multiple forward and backward passess would then go through the tail/stem and the detection head. Would you perhaps average the gradients ? or perhaps freeze the parameters of the non-searchable stages ?