titu1994 / keras-coordconv

Keras implementation of CoordConv for all Convolution layers
MIT License
148 stars 33 forks source link

TypeError: Expected int32, got 1.0 of type 'float' instead. #13

Closed pGit1 closed 5 years ago

pGit1 commented 5 years ago

@titu1994

I know your super busy (been reading your papers by the way and testing your model paridigms :D) but this is the error I get with coordconv2d. Wondering if I should just write a python function to add coordinates to my numpy arrays directly then just use Conv as usual. Seems that you normalize the coordinate grid though...

pGit1 commented 5 years ago

Also just a quick question how exactly does coord conv work in 1d case? Does it just add a timestep integer (not integer datatype but something like 1, 2, 3.0 depending on the timestep) to the input?

EDIT: Looks like it returns a normalized output between -1 and positive 1. Why would this help I wonder.

titu1994 commented 5 years ago

Thanks for the interest in the paper!

Could you post the stack trace of were exactly this error occurs? I've run CoordConv2D for the examples in this repo without issue.

As to your suggestion, that works actually. The "CoordConv" layer's job is just to augment the incoming input tensor with 2 more channels. The benefit of the layer is that it can be placed anywhere in the model architecture, (though the paper uses it only in the beginning), whereas your numpy input would act only on the first layer.

For time series, you are correct that it provides a new input dimension which holds values in the range [-1, 1] again indexing the timestep. I have not seen it provide any benefit to classification (just as CoordConv2D provides no benefit to image classification). However, for time series forecasting, it tends to help a little bit.

It normalizes to [-1, 1] range just as we normalize all NN inputs. There's no special reason for it other than to avoid causing training to slow down due to large inputs.

pGit1 commented 5 years ago

Thanks @titu1994 !! Ive solved all of my problems. The LSTM-FCN stuff is great! Anything else I should be on the lookout for??

titu1994 commented 5 years ago

Well, there's the Multivariate extension to LSTM-FCN for more challenging multivariate time series. And the ablation which explains a bit about our findings from LSTM-FCN.

Other than that, we recently worked on attacking time series (even classical models like DTW which have no parameters at all), which was a fun project personally.

That paper has a ton of small nice stuff (imo), like blazing fast multiprocessing computation of dtw which normally takes O(N1xN2xT1xT2) time complexity for two datasets (N1, N2 samples in the datasets, with max time series length of T1, T2 respectively) in semi reasonable speed even on python which I found life saving really.

Also a simple algo to convert any 1NN distance matrix to a softmax-probabilistic equivalent representation, which is neat (imo) cause it's still exact unlike approximate algorithms for probabilistic DTW.

Kinda niche stiff, but was a fun paper overall haha.

pGit1 commented 5 years ago

Very intersting!! Any links to your papers!

By the way I immediately applied the original LSTM-FCN paper to multivariate problems and it worked amazing. I want event aware of the multivariate paper until recently actually! :D

pGit1 commented 5 years ago

@titu1994

So I have a SIMPLE problem for you take a look at this code:

from keras import layers, models
import tensorflow as tf
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
import numpy as np
import keras.backend as K

class IncepLayer(layers.Layer):
    def __init__(self,filters=32):
        super(IncepLayer, self).__init__()
        self.filters = filters
        self.c1 = layers.Conv2D(filters=self.filters,kernel_size=1,padding='same')
        self.c2 = layers.Conv2D(filters=self.filters,kernel_size=1,padding='same')
        self.c22 = layers.Conv2D(filters=self.filters,kernel_size=5,padding='same')
        self.c3 = layers.Conv2D(filters=self.filters,kernel_size=1,padding='same')
        self.c33 = layers.Conv2D(filters=self.filters,kernel_size=7,padding='same')

    def build(self,input_shape):
        super(IncepLayer,self).build(input_shape)

    def call(self, inputs):
        t1 = self.c1(inputs)
        t2 = self.c2(inputs)
        t2 = self.c22(t2)
        t3 = self.c3(inputs)
        t3 = self.c33(t3)
        inp_kernels = inputs.shape[-1].value
        concat = layers.concatenate([t1,t2,t3])
        print(concat.shape)
        return concat

inp = layers.Input((28,28,3))
x = IncepLayer()(inp)

m = models.Model(inp,x)
m.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
print(m.summary())
SVG(model_to_dot(m).create(prog='dot',format='svg'))

For Some Reason I get this output(!!!!): Waht gives!? Please Help!

image

No weights. No trainable Parameters. Nada. And output shape is same as input.

UPDATE: I Know I could implement an Inception Like block with functions but I wanted it to be a layer so I could share the layer across inputs. Its not obvious to to me how to do that with function blocks. I added a compute_output_shape to the mix and that fixed the output shape problem but I have NO IDEA how to get this layer to actually train! :(

titu1994 commented 5 years ago

Keras has been at 2.2.4 for a long time, almost as long as two tensorflow releases (1.12 and 1.13). The Keras master support for Model Subclassing is not something that i've experimented with, so my remark may be unjustified and it could very well be that I do not have full knowledge of the differences between Keras Model Subclassing API and tf.keras Model Subclassing API.

As for the "fix", simply one line fix will get the model built correctly.

# Replace from keras import layers, models with below
from tensorflow.python.keras import layers, models
(?, 28, 28, 96)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 28, 28, 3)         0         
_________________________________________________________________
incep_layer (IncepLayer)     (None, 28, 28, 96)        76224     
=================================================================
Total params: 76,224
Trainable params: 76,224
Non-trainable params: 0
_________________________________________________________________
None  # This None is because you are using print(model.summary()). model.summary() returns None.
titu1994 commented 5 years ago

Another pedantic thing. Do not treat tf.keras.layers.Layer and tf.keras.Model as the same thing.

Hierarchically, a Model is comprised of multiple layers, where each Layer is an indivisible object, ie a layer's logic must comprise of its body or utilize other layers to create an irreducible operation. Think Squeeze and Excite operation, Non-Local block, ODEBlock, PositionalEncodings for Attention modules, BERT sublayers etc.

Ofc, there may exist layers which depend on multiple other layers themselves (in the idea of recurrent batch normalization, recurrent dropout, other new layers that use multiple sublayers as listed above).

Here, however, what you are treating as a "Layer" is actually hierarchically a Model, comprised of multiple Layers called sequentially. There is no special utilization of the Conv layers in this "Layer" that I would characterize as irreducible or indivisible.

This is all semantics. In actuality, tf.keras has matured to the point that you can get away with doing this without any issues at all. Hierarchically, what you have here should subclass models.Sequential. models.Model is also fine as they lie in parallel hierarchies. When I write models, I would not write this as an Incep"Layer" but an Incep"Model".

These are ofc just me being pedantic. Ignore them really. It's just odd for me that Keras would treat two separate hierarchical notions as the same.

titu1994 commented 5 years ago

but I wanted it to be a layer so I could share the layer across inputs

This is simpler to do with the functional api. Below is a contrived example. Its odd connectivity would probably make it fail training but it does show shared layers between multiple inputs.

from tensorflow.python.keras import layers, models
from keras.utils.vis_utils import plot_model

filters = 32

inp1 = layers.Input((28, 28, 3))
inp2 = layers.Input((28, 28, 3))

x1 = layers.Conv2D(filters=filters, kernel_size=1, padding='same')
x2 = layers.Conv2D(filters=filters, kernel_size=1, padding='same')
x3 = layers.Conv2D(filters=filters, kernel_size=5, padding='same')
x4 = layers.Conv2D(filters=filters, kernel_size=1, padding='same')
x5 = layers.Conv2D(filters=filters, kernel_size=7, padding='same')

# pathway 1
xp_1 = x5(x4(x3(x2(x1(inp1)))))  # full pathway

# pathway 2
xp_2 = x3(x4(x1(inp2)))  # reduced pathway  # notice the ordering is off, x4 comes before x3 and after x1, skips x2 and x5.

# add pathways
xp = layers.add([xp_1, xp_2])

m = models.Model([inp1, inp2], xp)
m.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(m.summary())
plot_model(m, show_shapes=True)

Produces the following output

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 28, 28, 3)    0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 28, 28, 32)   128         input_1[0][0]                    
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 28, 28, 32)   1056        conv2d[0][0]                     
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 28, 28, 3)    0                                            
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 28, 28, 32)   25632       conv2d_1[0][0]                   
                                                                 conv2d_3[1][0]                   
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 28, 28, 32)   1056        conv2d_2[0][0]                   
                                                                 conv2d[1][0]                     
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 28, 28, 32)   50208       conv2d_3[0][0]                   
__________________________________________________________________________________________________
add (Add)                       (None, 28, 28, 32)   0           conv2d_4[0][0]                   
                                                                 conv2d_2[1][0]                   
==================================================================================================
Total params: 78,080
Trainable params: 78,080
Non-trainable params: 0
__________________________________________________________________________________________________
None

Notice, number of parameters remain the same, showing shared weights as needed by you across multiple inputs.

model

pGit1 commented 5 years ago

DUDE YOU ARE FREEEEEEEEEEEEEAAAAAAAAAAAAAAAAAAAAAAAAAAAKKKKKINNG amazing!!!

So many of my questions answered in one swoop! This is BY FAR the most helpful information I've gotten on the topic and I stayed up late reading tutorials on tf.org and other sites. THANK you for "the fix", explanation of distinctions between layers and models, and how to do this with a functional api.

Speaking of the functional API this approach makes sense but it seems to have many such " inception like connectivity blocks" I would have to have a lot more code. Using the style you outline above. This isnt the worst thing in the world, I've just been trying to follow your paradigm of keeping things in blocks. I should have termed my IncepLayer as "IncepBlock." To create blocks that can be used in a shared way does not seem to lend itself to blocks so that is why I went to the Layers class approach. It didnt dawn on me that with a litte extra code I could do this so co easily with the functional API though. This is great insight!!

pGit1 commented 5 years ago

@titu1994

So you are one of literally a few people qualified to answer these questions. Thank you so much for your time! Means a lot!

Now that we have remidied the issues I had above I'm wondering what the utility of "sharing layers" really has other than potential memory savings, in a specific scenario.

Scenario: I have a multi-input model where each input is processed differently and correspond to the same target label. The goal is ti extract as much discriminative information as possible from each input to inform the prediction on the class label(s). To me processing each input with independent stems in the network may lead to better discriminative features but too memory intensive. Using shared layers helps alleviate the memory constraints as you point out in your example above, but I wonder if sharing layers reduces the models ability to extract discriminative information from the inputs, aside from the fact that I am explicitly regularizing the model by layer sharing (reducing total number of parameters). Just trying to organize my thinking on the subject of shared layers in general. Feel free to ignore this question as it is possible poorly worded but for the life of me I dont know if shared layers would help or not in my scenario (I can always test and find out but some intution as to what is going on when we share layers is still evading me).

titu1994 commented 5 years ago

On the point of calling it IncepBlock, I thought about it today and it makes more sense to use Layers everywhere and only use one Model to wrap all of the layers. Then it's just semantic naming of indivisible computation as a "Layer" and a divisible set of layers as a "Module" or a "Block". This makes computationally more sense, as Models are heavier objects than simple Layers. So I guess you were appropriate in extending Layer instead of Model here. My bad.

As to shared layers for multiple types of processing, what I would suggest is use one Inception block in the beginning for every input layer. After that, just concatenate or add them up.

The first few layers are the ones which discriminate more between shallow features likes shape, color, lines, etc and sharing the first layer amongst multiple inputs can cause the initial layers to have slower learning (even though they will still learn, and may do just as well eventually). After the first few layers, the features learned are sequentially more abstract and dependent on class, therefore lower layers would be better shared since after all the labels are the same for all inputs.

Thats my reasoning though, and analysis may prove me wrong.

pGit1 commented 5 years ago

Thanks good stuff! I appreciate the help! I am going the functional API route I think by the way.

Thanks again!!!

pGit1 commented 5 years ago

Keras has been at 2.2.4 for a long time, almost as long as two tensorflow releases (1.12 and 1.13). The Keras master support for Model Subclassing is not something that i've experimented with, so my remark may be unjustified and it could very well be that I do not have full knowledge of the differences between Keras Model Subclassing API and tf.keras Model Subclassing API.

As for the "fix", simply one line fix will get the model built correctly.

# Replace from keras import layers, models with below
from tensorflow.python.keras import layers, models
(?, 28, 28, 96)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 28, 28, 3)         0         
_________________________________________________________________
incep_layer (IncepLayer)     (None, 28, 28, 96)        76224     
=================================================================
Total params: 76,224
Trainable params: 76,224
Non-trainable params: 0
_________________________________________________________________
None  # This None is because you are using print(model.summary()). model.summary() returns None.

I just tested this approach in tf 1.12.0 but it still produced the 0 parameters behavior.

Wonder why I cannot reproduce your result. :(

image

EDIT:

My full script was:


from tensorflow.python.keras import layers, models

class IncepLayer(layers.Layer):
    def __init__(self,filters=32):
        super(IncepLayer, self).__init__()
        self.filters = filters
        self.c1 = layers.Conv2D(filters=self.filters,kernel_size=1,padding='same')
        self.c2 = layers.Conv2D(filters=self.filters,kernel_size=1,padding='same')
        self.c22 = layers.Conv2D(filters=self.filters,kernel_size=5,padding='same')
        self.c3 = layers.Conv2D(filters=self.filters,kernel_size=1,padding='same')
        self.c33 = layers.Conv2D(filters=self.filters,kernel_size=7,padding='same')

    def build(self,input_shape):
        super(IncepLayer,self).build(input_shape)

    def call(self, inputs):
        t1 = self.c1(inputs)
        t2 = self.c2(inputs)
        t2 = self.c22(t2)
        t3 = self.c3(inputs)
        t3 = self.c33(t3)
        inp_kernels = inputs.shape[-1].value
        concat = layers.concatenate([t1,t2,t3])
        print(concat.shape)
        return concat

inp = layers.Input((28,28,3))
x = IncepLayer()(inp)

m = models.Model(inp,x)
m.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
print(m.summary())
titu1994 commented 5 years ago

I'm using TF 1.13. Without modifications to your above script, it gives -

(?, 28, 28, 96)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 28, 28, 3)         0         
_________________________________________________________________
incep_layer (IncepLayer)     (None, 28, 28, 96)        76224     
=================================================================
Total params: 76,224
Trainable params: 76,224
Non-trainable params: 0
_________________________________________________________________
pGit1 commented 5 years ago

Ah I think thats it! I'm on 1.12 because my system is tied to CUDA9. It looks like the ability to do this is brand new to tf1.13 and beyond. No wonder I was reading about this in the TF2.0 alpha docs! Thanks for the insight!