Open MiguelMonteiro opened 6 years ago
I found what I think is a bug in the implementation. The original permutohedral c++ files suppose that you give your compute function value_size = image_channels + 1. In your code value_size will always be equal to image_channels (3) instead of 4. This causes the blurring to now work as intended.
Hi, @MiguelMonteiro , thanks a lot for pointing this out! If convenient, could you please submit a PR? Thanks!
I still would like to verify if @sadeepj can reproduce this bug or if I am just using it wrong. Nonetheless, I have a CPU version that works for bilateral filtering, however, I have yet to integrate it with CRF-RNN layer and Tensorflow. I am currently working on a GPU version. I will release both when they are done.
@MiguelMonteiro , sounds great! Thanks! Looking forward to your PR!
@MiguelMonteiro @glhfgg1024 Thanks for reporting this. However, I don't think there's a bug in the Permutohedral lattice implementation in the repo. If there was a bug, we wouldn't be able to get any sensible results out of the CrfRnnLayer.
I believe the problem here is a mismatch in conventions. I haven't had time to check this, but I think that the original permutohedral implementation expects the image and the values to be filtered in (height, width, channels) order, whereas the ModifiedPermutohedral implementation in this repo assumes (channels, height, width) order. We originally did this modification for our Caffe implementation. I should have reverted this when porting the code to Tensorflow since Tensorflow supports the (height, width, channel) order. I will do this at some point. But for the time being, please note that the ModifiedPermutohedral implementation in this repo expects (channels, height, width) order.
I know, I have tried using your implementation with (channels, height, width) and instead of a blur I get a repeating structure overlaying the image (and possibly a blur underneath, it's hard to tell).
Before, when I mistakenly used the order (height, width, channel) I got smaller copies of the image overlaying itself (sort of mosaic). This is the bug you get with a mismatch in conventions.
One of the things the original CPU implementation assumes is that vd = image_channels + 1. Hence vd is 4. During the splatting stage, they add another channel so that each pixel is coded as (r, g, b , 1). They call it the homogeneous coordinates. In your implementation, I could not find this extra channel added in.
I believe that it is possible to get sensible results despite this bug (if it is indeed a bug) because there are so many other moving parts that could mitigate the effect of the filter not working at 100%.
I currently have 2 CPU implementations of the lattice wrapped in a simple program to filter a picture of a dog:
If you would like I can make this code available to you so that you can test the bug faster.
For completion's sake, in your implementation when you switch (channel, height, width) for (height, width, channel) you get something like this:
Hi @MiguelMonteiro
Many thanks for the explanation & examples, and sorry about the wrong guess!
Could you let me know what the equivalent of the vd variable is in the https://github.com/sadeepj/crfasrnn_keras/blob/master/src/cpp/modified_permutohedral.cc file? When using Permutohedral lattice for Dense CRF inference, we filter probability distributions in two different spaces: (r, g, b, x, y) and (x, y). The vd parameter value should be 5 and 2 in these two cases (if vd actually means what I think it does).
Generally speaking, it's probably better to compare the Permutohedral implementation in this repo with the ones available in the original Dense CRF code (http://www.philkr.net/papers/2013-06-01-icml/densecrf_v_2_2.zip and http://www.philkr.net/papers/2011-12-01-nips/densecrf_v_2_2.zip.
So in the first link vd
is referred to as value_size
.
Regarding CRF Dense Inference your are not entirely correct. What you are referring to is d
which is indeed 5 and 2 in those cases. d + 1
is the number of axis the lattice will have which is different from vd
. In the permutohedral paper there is a reference to the coordinate space (r, g, b, 1)
which I assume is the coordinate space to which vd
refers. Meaning vd
is the dimensionality of the space (vd = 4
).
To be fair, even the original authors are undecided to what vd
is. In their CPU implementation they treat as vd=image_channels + 1
and in their GPU implementation as vd=image_channels
. But because they remain consistent it works for both of them.
My guess is that there is some for cycle somewhere in the code where <
should be <=
which is causing the bug.
I will checkout that particular implementation and compare it with the current one. I will let you know the results.
P.S. The variable d
also appears as pd
or kd
in various implementations.
I have invited you as a collaborator in my repo so you can see the code and bug for yourself. Ignore the master branch, the sse
branch is your version of the code and themyversion
code is refactored version of the original code. Check README of each branch for the run parameters.
Hello, guys, any advances here?
I have finished a CPU and GPU version of the Lattice but I still have to get it into a Tensorflow kernel, I should finish it sometime around next week but I don't know when it will be ready for public consumption. I kind of started from scratch and the code still needs some cleaning up.
Hi @MiguelMonteiro, thanks a lot for your reply! I compared the https://github.com/Kamnitsask/dense3dCrf/blob/master/src/permutohedral.cpp and https://github.com/sadeepj/crfasrnn_keras/blob/master/src/cpp/modified_permutohedral.cc, it seems there is no significant difference between them except the the storage of kernel features
, the first one is using column-major Eigen::MatrixXf
and the latter is using row-major Tensor
in TensorFlow. And another difference is the Permutohedral::gradient
function is missing in the latter.
I haven't checked the dense3dCRF code but if the code from here came from there it might have the same bug. I decided to implement almost from scratch to avoid mistakes, this is why it is taking so long (also programming for both CPU and GPU).
The original permutohedral paper proposes an approximation for a bilateral filter (aka something that blurs the image but respects edges). The actual implementation is quite complex and it's easy to make mistakes especially if you are including it in as part of a CRF and not looking at the output of the filter directly. Therefore I started with the implementation of the original paper (CPU and GPU) which performed the filter and worked from there to make the code more readable and portable while ensuring that it didn't lose the property of still filtering the image (seemed important).
If you would like to help you could take that code, just the bilateral filter part and try to blur an image, if it comes out ok then the bug is just here.
@MiguelMonteiro, thanks, I would like to see where the possible bug is. Currently I'm stuck in how to adjust those parameters (or initial parameters) in this model for semantic segmentation. And I found the improvement based on the CRF module is very limited compared to those before this module. Thus I'm not sure if I selected the good parameters or if the code has a possible bug as you pointed out. I'm not familiar with the bilateral filtering part, especially the permutohedral lattice. As far as I understood (maybe wrong), I just know the lattice is to use convolution in high-dimensional feature space to compute the pairwise correlations, and return back to the original space.
That's a good intuition of what it does, the permutohedral lattice is just the thing that does the bilateral filter provided you pass it the correct kernel. As for parameter selection, I haven't got that far I stopped at this bug and when I realized I would need a GPU version for my application. In the future, it would be interesting to see if it is possible to optimize the parameters in training as if they were weights of the neural network, but I am just speculating. Anyways, if you want to continue this discussion we can do it through email instead of here.
Hi @MiguelMonteiro, I've sent you an email. Please kindly have a check. Thanks!
Hello guys (@MiguelMonteiro especially :-) ), I am trying to use the CrfasRnnLayer in my own application too and I noticed some unexpected (at least with my comprehension of what it should do) behaviour, especially when I freeze some weights.
Are there any news concerning the GPU version of this code ? That would allow me to test some stuff faster thus further as for now training is painfully slow (has been training on CPU for a week now)...
If I can be of any help regarding the GPU version please don't hesitate !
Thanks !
Hello,
I have trying to use the permutohedral lattice code to perform bilateral filtering just to see if it works as expected. What happens is: I get an image with a repeating artefact overlayed over the original image instead of a blur effect. I don't know if this is how your implementation is supposed to work or if there is some kind of bug. Would you mind attempting to perform bilateral filtering on your side to check this issue?
Also I have across a paper that might interest you since it can speed up the message passing step:
Recursive bilateral filtering - Yang, Qingxiong - European Conference on Computer Vision - 2012
Thanks.