sadeepj / crfasrnn_keras

CRF-RNN Keras/Tensorflow version
http://crfasrnn.torr.vision
MIT License
603 stars 169 forks source link

Permutohedral Lattice implementation #24

Open MiguelMonteiro opened 6 years ago

MiguelMonteiro commented 6 years ago

Hello,

I have trying to use the permutohedral lattice code to perform bilateral filtering just to see if it works as expected. What happens is: I get an image with a repeating artefact overlayed over the original image instead of a blur effect. I don't know if this is how your implementation is supposed to work or if there is some kind of bug. Would you mind attempting to perform bilateral filtering on your side to check this issue?

Also I have across a paper that might interest you since it can speed up the message passing step:

Recursive bilateral filtering - Yang, Qingxiong - European Conference on Computer Vision - 2012

Thanks.

MiguelMonteiro commented 6 years ago

I found what I think is a bug in the implementation. The original permutohedral c++ files suppose that you give your compute function value_size = image_channels + 1. In your code value_size will always be equal to image_channels (3) instead of 4. This causes the blurring to now work as intended.

glhfgg1024 commented 6 years ago

Hi, @MiguelMonteiro , thanks a lot for pointing this out! If convenient, could you please submit a PR? Thanks!

MiguelMonteiro commented 6 years ago

I still would like to verify if @sadeepj can reproduce this bug or if I am just using it wrong. Nonetheless, I have a CPU version that works for bilateral filtering, however, I have yet to integrate it with CRF-RNN layer and Tensorflow. I am currently working on a GPU version. I will release both when they are done.

glhfgg1024 commented 6 years ago

@MiguelMonteiro , sounds great! Thanks! Looking forward to your PR!

sadeepj commented 6 years ago

@MiguelMonteiro @glhfgg1024 Thanks for reporting this. However, I don't think there's a bug in the Permutohedral lattice implementation in the repo. If there was a bug, we wouldn't be able to get any sensible results out of the CrfRnnLayer.

I believe the problem here is a mismatch in conventions. I haven't had time to check this, but I think that the original permutohedral implementation expects the image and the values to be filtered in (height, width, channels) order, whereas the ModifiedPermutohedral implementation in this repo assumes (channels, height, width) order. We originally did this modification for our Caffe implementation. I should have reverted this when porting the code to Tensorflow since Tensorflow supports the (height, width, channel) order. I will do this at some point. But for the time being, please note that the ModifiedPermutohedral implementation in this repo expects (channels, height, width) order.

MiguelMonteiro commented 6 years ago

I know, I have tried using your implementation with (channels, height, width) and instead of a blur I get a repeating structure overlaying the image (and possibly a blur underneath, it's hard to tell).

Before, when I mistakenly used the order (height, width, channel) I got smaller copies of the image overlaying itself (sort of mosaic). This is the bug you get with a mismatch in conventions.

One of the things the original CPU implementation assumes is that vd = image_channels + 1. Hence vd is 4. During the splatting stage, they add another channel so that each pixel is coded as (r, g, b , 1). They call it the homogeneous coordinates. In your implementation, I could not find this extra channel added in.

I believe that it is possible to get sensible results despite this bug (if it is indeed a bug) because there are so many other moving parts that could mitigate the effect of the filter not working at 100%.

I currently have 2 CPU implementations of the lattice wrapped in a simple program to filter a picture of a dog:

  1. A copy of yours. This is the resulting image:

modified_perm

  1. A mix of your implementation and the original one which fixes the bug: original_perm

If you would like I can make this code available to you so that you can test the bug faster.

MiguelMonteiro commented 6 years ago

For completion's sake, in your implementation when you switch (channel, height, width) for (height, width, channel) you get something like this:

1234

sadeepj commented 6 years ago

Hi @MiguelMonteiro

Many thanks for the explanation & examples, and sorry about the wrong guess!

Could you let me know what the equivalent of the vd variable is in the https://github.com/sadeepj/crfasrnn_keras/blob/master/src/cpp/modified_permutohedral.cc file? When using Permutohedral lattice for Dense CRF inference, we filter probability distributions in two different spaces: (r, g, b, x, y) and (x, y). The vd parameter value should be 5 and 2 in these two cases (if vd actually means what I think it does).

Generally speaking, it's probably better to compare the Permutohedral implementation in this repo with the ones available in the original Dense CRF code (http://www.philkr.net/papers/2013-06-01-icml/densecrf_v_2_2.zip and http://www.philkr.net/papers/2011-12-01-nips/densecrf_v_2_2.zip.

MiguelMonteiro commented 6 years ago

So in the first link vd is referred to as value_size.

Regarding CRF Dense Inference your are not entirely correct. What you are referring to is d which is indeed 5 and 2 in those cases. d + 1 is the number of axis the lattice will have which is different from vd. In the permutohedral paper there is a reference to the coordinate space (r, g, b, 1) which I assume is the coordinate space to which vd refers. Meaning vd is the dimensionality of the space (vd = 4). To be fair, even the original authors are undecided to what vd is. In their CPU implementation they treat as vd=image_channels + 1 and in their GPU implementation as vd=image_channels . But because they remain consistent it works for both of them.

My guess is that there is some for cycle somewhere in the code where < should be <= which is causing the bug.

I will checkout that particular implementation and compare it with the current one. I will let you know the results.

P.S. The variable d also appears as pd or kd in various implementations.

MiguelMonteiro commented 6 years ago

I have invited you as a collaborator in my repo so you can see the code and bug for yourself. Ignore the master branch, the sse branch is your version of the code and themyversion code is refactored version of the original code. Check README of each branch for the run parameters.

glhfgg1024 commented 6 years ago

Hello, guys, any advances here?

MiguelMonteiro commented 6 years ago

I have finished a CPU and GPU version of the Lattice but I still have to get it into a Tensorflow kernel, I should finish it sometime around next week but I don't know when it will be ready for public consumption. I kind of started from scratch and the code still needs some cleaning up.

glhfgg1024 commented 6 years ago

Hi @MiguelMonteiro, thanks a lot for your reply! I compared the https://github.com/Kamnitsask/dense3dCrf/blob/master/src/permutohedral.cpp and https://github.com/sadeepj/crfasrnn_keras/blob/master/src/cpp/modified_permutohedral.cc, it seems there is no significant difference between them except the the storage of kernel features, the first one is using column-major Eigen::MatrixXf and the latter is using row-major Tensor in TensorFlow. And another difference is the Permutohedral::gradient function is missing in the latter.

MiguelMonteiro commented 6 years ago

I haven't checked the dense3dCRF code but if the code from here came from there it might have the same bug. I decided to implement almost from scratch to avoid mistakes, this is why it is taking so long (also programming for both CPU and GPU).

The original permutohedral paper proposes an approximation for a bilateral filter (aka something that blurs the image but respects edges). The actual implementation is quite complex and it's easy to make mistakes especially if you are including it in as part of a CRF and not looking at the output of the filter directly. Therefore I started with the implementation of the original paper (CPU and GPU) which performed the filter and worked from there to make the code more readable and portable while ensuring that it didn't lose the property of still filtering the image (seemed important).

If you would like to help you could take that code, just the bilateral filter part and try to blur an image, if it comes out ok then the bug is just here.

glhfgg1024 commented 6 years ago

@MiguelMonteiro, thanks, I would like to see where the possible bug is. Currently I'm stuck in how to adjust those parameters (or initial parameters) in this model for semantic segmentation. And I found the improvement based on the CRF module is very limited compared to those before this module. Thus I'm not sure if I selected the good parameters or if the code has a possible bug as you pointed out. I'm not familiar with the bilateral filtering part, especially the permutohedral lattice. As far as I understood (maybe wrong), I just know the lattice is to use convolution in high-dimensional feature space to compute the pairwise correlations, and return back to the original space.

MiguelMonteiro commented 6 years ago

That's a good intuition of what it does, the permutohedral lattice is just the thing that does the bilateral filter provided you pass it the correct kernel. As for parameter selection, I haven't got that far I stopped at this bug and when I realized I would need a GPU version for my application. In the future, it would be interesting to see if it is possible to optimize the parameters in training as if they were weights of the neural network, but I am just speculating. Anyways, if you want to continue this discussion we can do it through email instead of here.

glhfgg1024 commented 6 years ago

Hi @MiguelMonteiro, I've sent you an email. Please kindly have a check. Thanks!

Toz3wm commented 6 years ago

Hello guys (@MiguelMonteiro especially :-) ), I am trying to use the CrfasRnnLayer in my own application too and I noticed some unexpected (at least with my comprehension of what it should do) behaviour, especially when I freeze some weights. Are there any news concerning the GPU version of this code ? That would allow me to test some stuff faster thus further as for now training is painfully slow (has been training on CPU for a week now)...
If I can be of any help regarding the GPU version please don't hesitate ! Thanks !