Closed yudawen closed 5 years ago
I's due to the large size of our choice of representation, signals on SO(3) : (2b)^3 You could use a smaller representation like signals on S2, of size (2b)^2 It's what they did in https://arxiv.org/abs/1711.06721
You could even use both: S2 -> S2 -> S2 -> SO3 -> SO3 -> ...
The input image(pixel coordinate(x,y)) after S2 is changed into (alpha, beta,gamma), how to transform(alpha, beta,gamma) return back to the image (x,y) format? In short, is it possible to get a 2-dimensional feature map after S2/SO3 convolution again?
The operation so3_integrate is only the way to connect the SO3 convolution and full connect layer ?
If you integrate a SO3 signal over the gamma
component you get a signal on S2.
so3_integrate
is one equivariant way but there exist others, like for instance max pooling.
”If you integrate a SO3 signal over the gamma component you get a signal on S2“.Can a signal on S2 be traced back to the spherical inputs?
Sorry. Can the output of S2 / SO3 convolution have a one-to-one correspondence with the position of the pixel on the original image, regardless of the bandwidth reduction?@mariogeiger @jonas-koehler
Can a signal on S2 be traced back to the spherical inputs?
Yes... the input is on S2
Can the position of the original pixel and the rotated pixel correspond?
I don't understand the question
I'm afraid to not understand your questions, can you rephrase please ?
Sorry. Can the output of S2 / SO3 convolution have a one-to-one correspondence with the position of the pixel on the original image, regardless of the bandwidth reduction?
Yes, sure they correspond in the same way the position of a pixel in the output of a 2d convolution corresponds to the position of an input pixel. In the way the S2/SO3 convolution is implemented, the filters are rotated but not the input signal.
It is visible in this image
I mean, can I locate the interested object on the output of S2/SO3? I tried it before, and the result was a mess.
Ok it should be at the same location, actually the equivariance property enforce it. I don't know why the result was a mess.
This figure shows that the result of performing the convolution first and then performing the rotation is the same as the result of performing the rotation first and then performing the convolution? I understand it wrong?
在此图像中可见
The first picture in the second line corresponds to the second picture in the second line, and they do not correspond to the position of the first picture(original input) of the first line.
Sorry to spam you all with multiple issues,but what is wrong with my understanding?I have been thinking about these issues lately. What went wrong?@mariogeiger @tscohen @jonas-koehler
I'm also not sure I understand your question, but I think you want to know how to interpret the position of each activation in the output feature map. ("Can the output of S2 / SO3 convolution have a one-to-one correspondence with the position of the pixel on the original image, regardless of the bandwidth reduction?")
We use the grid defined by Kostelec & Rockmore, 2007: https://www.cs.dartmouth.edu/~geelong/soft/soft20_fx.pdf. At each layer you have a grid of a certain resolution / bandwidth b. This grid has shape 2b2b2b for SO3 and 2b*2b for S2. For SO3, the first two axes correspond to position on the sphere, and the third one corresponds to the orientation of the feature. For S2 both axes code the position. If you have an integer index (i, j, k) for the three axes, this corresponds to three angles alpha, beta, gamma, as defined by Kostelec & Rockmore (defined right below eq 26). From those equations for alpha_j1 beta_k gamma_j2 it should be possible to see if a grid for bandwidth B' < B is a subset of the grid of bandwidth B, which would mean that each point in the grid of resolution B' also exists in the grid of resolution B (but not vice versa, obviously). If the low-res grid is not a subset of the high-res grid, you can still relate each sample in the low-res grid to a position on the sphere, but this position may not be exactly on a grid point of the high-res grid.
The computational cost of spherical convolution is too great. Is there any measure that makes spherical convolution can handle large images?