Closed dreambear1234 closed 6 years ago
Dot product is the same as matrix multiplication when you have batch size in the mix as well. See the operations np.dot and TF.matmul on two tensors of the same shape to see that.
The paper does not discuss the dot product when it is in batch mode. I applied the same ops as the paper in a batch format.
Reshape layer does not take into account the batch dim. So in fact the shapes are correct according to the paper.
Hi @titu1994
Thank you for sharing your code of non local net But I am afraid the some implementation is not correct, I agree with @dreambear1234
I think dot product in keras cannot replace matmul in tensorflow I have read several implementations of non local structure written in tensorflow and pytorch, non of them uses dot to replace the matmul
In addition, I think the compression
implementation in this repo is not correct,
In Xiaolong Wang's paper, https://arxiv.org/abs/1711.07971, section 3.3, Implementation of Non-local BLocks, a subsampling trick is stated. It says that max pooling is added after phi and g, and such pooling is applied to spatial domain to reduce the computation by 1/4. In this implementation, MaxPool1D can only reduce the computation by 1/2
I reimplement the non local block here https://gist.github.com/XupingZHENG/1fc7fe7a42ab93fbe353d8a7cf6c84a8
@XupingZHENG. Please refer to the Tensorflow backend here - https://github.com/keras-team/keras/blob/master/keras/backend/tensorflow_backend.py#L1019 and https://github.com/keras-team/keras/blob/master/keras/backend/tensorflow_backend.py#L1089
Keras unifies the interface with Theano, the original framework it supported, by addressing Dot and Matmul as the same operation. Though the backend call is "K.dot or K.batch_dot", it applies TF.matmul internally.
As to the pooling after phi and g, I do it two times in the spatial domain.l, thereby reducing it to 1/4. Please refer - https://gist.github.com/XupingZHENG/1fc7fe7a42ab93fbe353d8a7cf6c84a8#file-nonlocal-py-L116 and then https://gist.github.com/XupingZHENG/1fc7fe7a42ab93fbe353d8a7cf6c84a8#file-nonlocal-py-L126
Turn paper clearly states that the pooling must be done after phi and after g, thereby reducing the complexity by 4.
In the published paper and your code, I saw dot product in many place, But as far as I understand from Figure 2 in paper, it is matrix multiplication rather than a dot product, there should be such a procedure for part of "embedded gaussian"