Closed louadi closed 1 year ago
Hi Zakaria, apologies that you had to encounter this error. Seems a bit tricky to debug. Would it be possible to share the architecture by any chance? If you prefer, you can also mail it to me at surag@stanford.edu. Even a minimal version with which you can replicate the error works.
Hi Surag,
Thanks for your fast reply!
I tried different version with the architecture and it seems like its this part that triggers the error:
se_avr = layers.Reshape(se_shape, name=name + "se_reshape_avr")(se_avr)
se_max = layers.Reshape(se_shape, name=name + "se_reshape_max")(se_max)
shared_layer_one = layers.Dense(filters // se_ratio,
activation='relu',
kernel_initializer='he_normal',
use_bias=True,
bias_initializer='zeros')
shared_layer_two = layers.Dense(filters,
activation='sigmoid',
kernel_initializer='he_normal',
use_bias=True,
bias_initializer='zeros')
se_avr=shared_layer_two(shared_layer_one(se_avr))
se_max=shared_layer_two(shared_layer_one(se_max))
se=layers.add([se_avr, se_max], name=name + "add_av_max")
x = layers.multiply([x, se], name=name + "se_excite")
So essentially a squeeze and excitation block. Do you think it is because the multiplication layer is not supported?
Best, Zakaria
Is x
an output of previous layers? I suspect it may be related to this issue: https://github.com/kundajelab/fastISM/blob/4fc1f44/test/test_unresolved.py. Essentially, something like this:
x = tf.keras.layers.Conv1D(10, 3, padding='same')(inp)
y = tf.keras.layers.Dense(10)(x)
x = tf.keras.layers.Add()([x,y])
So basically the input to the Add
layer (in your case multiply
) has one input that is connected to the sequence (inp
above) and does not have any STOP LAYERS on the way (reshape, dense, etc.) while the other input is also connected to the sequence but it has a STOP LAYER on the way (like your se
above). It raises the same error you are seeing. Can you confirm that my description is correct?
I read up on squeeze and excitation networks. I think a bigger issue is that fastISM won't speed up SE layers. That's because fastISM works for layers when a change at position i
only affects a few positions around it. However, in the case of SE (if I understand correctly), any mutation at any position will affect the representation at all other positions (due to the global max pooling). As a result, fastISM (even after I debug it properly), will stop after the first SE layer and not give any speedups.
So if you are using SE along with your main convs blocks then unfortunately fastISM will not be able to provide any speedup.
Right it make sense that both spacial and channel attention will be tricky, thank you for the explanation!
One last thing I am adding custom layers correctly or is there anything else I need to define?
You are adding it correctly. However, I'm not sure if LayerNormalization
is indeed a SEE THROUGH layer at test time, since similar to SE it's also taking an average across all positions.
Right that would explain my other bug. Thanks a lot anyway for your help and for the cool project!
Thanks for giving it a try and feel free to reach out if you have any other issues!
Hi there,
Thanks for the very helpful package!
I m having trouble loading a model with custom layers. and getting the following error. Any feedback would be very helpful!
Best, Zakaria