I am using this issue to think through the options of adding support for LoRa.
The first fundamental question is: Can we do this without changing the model code itself? I.e., let's assume we are not allowed to touch convnext.py. How would we proceed?
Ultimately our goal would be to replace (all? some?) tf.keras.layers.Conv2D layers with our new LoraConv2D layers. We could try and do this via monkey patching.
Note: All code samples are based on the convnext-tag branch.
import tensorflow as tf
from tfimm.architectures import convnext
class LoraConv2D(tf.keras.layers.Conv2D):
...
def main():
cls, cfg = convnext.convnext_atto()
# Monkey-patching conv layer
old_conv_layer = convnext.tf.keras.layers.Conv2D
convnext.tf.keras.layers.Conv2D = LoraConv2D
model = cls(cfg=cfg)
model(model.dummy_inputs)
# Reversing changes. This would become a context manager of course.
convnext.tf.keras.layers.Conv2D = old_conv_layer
# stem[0] is the first convolutional layer in the stem
print(type(model.stem[0]))
if __name__ == "__main__":
main()
This works, but it has some drawbacks:
We have no layer-wise control. We can either swap all layers from Conv2D to LoraConv2D or none. And all layers will use the same parameters, i.e., we cannot use different values for r for queries, keys and values (in a transformer, not applicable here).
We can't be sure that we reach all Conv2D layers. By default this only affects code in convnext.py itself. In fact, most convolutional layers in ConvNeXt are implemented as part of MLP layers in tfimm/layers/transformers.py. This is surmountable, since we can patch Conv2D in all files belonging to tfimm. Implemented as a central context manager the complexity is manageable. It would be a problem, if we were to rely on external code for layers/blocks, but that is not the case at the moment.
We would like to achieve layer-wise control, i.e., swap only some layers, but not others. We could do that by specifying layers to be swapped by their names.
When monkey patching we could set convnext.tf.keras.layers.Conv2D = conv_layer_factory with conv_layer_factory being a smart function which returns either a Conv2D layer or a LoraConv2D layer depending on what is needed. Unfortunately, I don't think this function has the necessary context to assemble the full layer name: The full nested name is generated via tf.name_scope when the layer is built in build(), not when it is defined in __init__(). Sample factory code:
One way to achieve that is model surgery. Here is some example code for inserting or swapping layers. But this code creates a new functional model. Ideally, we would modify our model in place.
I am using this issue to think through the options of adding support for LoRa.
The first fundamental question is: Can we do this without changing the model code itself? I.e., let's assume we are not allowed to touch
convnext.py
. How would we proceed?Ultimately our goal would be to replace (all? some?)
tf.keras.layers.Conv2D
layers with our newLoraConv2D
layers. We could try and do this via monkey patching.Note: All code samples are based on the
convnext-tag
branch.This works, but it has some drawbacks:
convnext.py
itself. In fact, most convolutional layers in ConvNeXt are implemented as part of MLP layers intfimm/layers/transformers.py
. This is surmountable, since we can patch Conv2D in all files belonging totfimm
. Implemented as a central context manager the complexity is manageable. It would be a problem, if we were to rely on external code for layers/blocks, but that is not the case at the moment.We would like to achieve layer-wise control, i.e., swap only some layers, but not others. We could do that by specifying layers to be swapped by their names.
When monkey patching we could set
convnext.tf.keras.layers.Conv2D = conv_layer_factory
withconv_layer_factory
being a smart function which returns either aConv2D
layer or aLoraConv2D
layer depending on what is needed. Unfortunately, I don't think this function has the necessary context to assemble the full layer name: The full nested name is generated viatf.name_scope
when the layer is built inbuild()
, not when it is defined in__init__()
. Sample factory code:One way to achieve that is model surgery. Here is some example code for inserting or swapping layers. But this code creates a new functional model. Ideally, we would modify our model in place.