titu1994 / keras-non-local-nets

Keras implementation of Non-local Neural Networks
MIT License
290 stars 100 forks source link

Are there any matrix multiplication? #2

Closed dreambear1234 closed 6 years ago

dreambear1234 commented 6 years ago

In the published paper and your code, I saw dot product in many place, But as far as I understand from Figure 2 in paper, it is matrix multiplication rather than a dot product, there should be such a procedure for part of "embedded gaussian"

  1. reshape HW512 to HW*512
  2. transpose one output into 512*HW
  3. apply matrix multiplication (HW512) (512HW) = HWHW Are there anything wrong with this? Thank you for your reply!
titu1994 commented 6 years ago

Dot product is the same as matrix multiplication when you have batch size in the mix as well. See the operations np.dot and TF.matmul on two tensors of the same shape to see that.

The paper does not discuss the dot product when it is in batch mode. I applied the same ops as the paper in a batch format.

Reshape layer does not take into account the batch dim. So in fact the shapes are correct according to the paper.

xpngzhng commented 6 years ago

Hi @titu1994

Thank you for sharing your code of non local net But I am afraid the some implementation is not correct, I agree with @dreambear1234

I think dot product in keras cannot replace matmul in tensorflow I have read several implementations of non local structure written in tensorflow and pytorch, non of them uses dot to replace the matmul

In addition, I think the compression implementation in this repo is not correct, In Xiaolong Wang's paper, https://arxiv.org/abs/1711.07971, section 3.3, Implementation of Non-local BLocks, a subsampling trick is stated. It says that max pooling is added after phi and g, and such pooling is applied to spatial domain to reduce the computation by 1/4. In this implementation, MaxPool1D can only reduce the computation by 1/2

I reimplement the non local block here https://gist.github.com/XupingZHENG/1fc7fe7a42ab93fbe353d8a7cf6c84a8

titu1994 commented 6 years ago

@XupingZHENG. Please refer to the Tensorflow backend here - https://github.com/keras-team/keras/blob/master/keras/backend/tensorflow_backend.py#L1019 and https://github.com/keras-team/keras/blob/master/keras/backend/tensorflow_backend.py#L1089

Keras unifies the interface with Theano, the original framework it supported, by addressing Dot and Matmul as the same operation. Though the backend call is "K.dot or K.batch_dot", it applies TF.matmul internally.

As to the pooling after phi and g, I do it two times in the spatial domain.l, thereby reducing it to 1/4. Please refer - https://gist.github.com/XupingZHENG/1fc7fe7a42ab93fbe353d8a7cf6c84a8#file-nonlocal-py-L116 and then https://gist.github.com/XupingZHENG/1fc7fe7a42ab93fbe353d8a7cf6c84a8#file-nonlocal-py-L126

titu1994 commented 6 years ago

Turn paper clearly states that the pooling must be done after phi and after g, thereby reducing the complexity by 4.