jonkhler / s2cnn

Spherical CNNs
MIT License
939 stars 176 forks source link

S2CNN Computational Complexity #28

Closed yudawen closed 5 years ago

yudawen commented 5 years ago

The computational cost of spherical convolution is too great. Is there any measure that makes spherical convolution can handle large images?

mariogeiger commented 5 years ago

I's due to the large size of our choice of representation, signals on SO(3) : (2b)^3 You could use a smaller representation like signals on S2, of size (2b)^2 It's what they did in https://arxiv.org/abs/1711.06721

You could even use both: S2 -> S2 -> S2 -> SO3 -> SO3 -> ...

yudawen commented 5 years ago

The input image(pixel coordinate(x,y)) after S2 is changed into (alpha, beta,gamma), how to transform(alpha, beta,gamma) return back to the image (x,y) format? In short, is it possible to get a 2-dimensional feature map after S2/SO3 convolution again?

yudawen commented 5 years ago

The operation so3_integrate is only the way to connect the SO3 convolution and full connect layer ?

mariogeiger commented 5 years ago

If you integrate a SO3 signal over the gamma component you get a signal on S2.

so3_integrate is one equivariant way but there exist others, like for instance max pooling.

yudawen commented 5 years ago

”If you integrate a SO3 signal over the gamma component you get a signal on S2“.Can a signal on S2 be traced back to the spherical inputs?

yudawen commented 5 years ago

Sorry. Can the output of S2 / SO3 convolution have a one-to-one correspondence with the position of the pixel on the original image, regardless of the bandwidth reduction?@mariogeiger @jonas-koehler

mariogeiger commented 5 years ago

Can a signal on S2 be traced back to the spherical inputs?

Yes... the input is on S2

Can the position of the original pixel and the rotated pixel correspond?

I don't understand the question

I'm afraid to not understand your questions, can you rephrase please ?

yudawen commented 5 years ago

Sorry. Can the output of S2 / SO3 convolution have a one-to-one correspondence with the position of the pixel on the original image, regardless of the bandwidth reduction?

mariogeiger commented 5 years ago

Yes, sure they correspond in the same way the position of a pixel in the output of a 2d convolution corresponds to the position of an input pixel. In the way the S2/SO3 convolution is implemented, the filters are rotated but not the input signal.

mariogeiger commented 5 years ago

It is visible in this image

yudawen commented 5 years ago

I mean, can I locate the interested object on the output of S2/SO3? I tried it before, and the result was a mess.

mariogeiger commented 5 years ago

Ok it should be at the same location, actually the equivariance property enforce it. I don't know why the result was a mess.

yudawen commented 5 years ago

This figure shows that the result of performing the convolution first and then performing the rotation is the same as the result of performing the rotation first and then performing the convolution? I understand it wrong?

yudawen commented 5 years ago

在此图像中可见 The first picture in the second line corresponds to the second picture in the second line, and they do not correspond to the position of the first picture(original input) of the first line.

yudawen commented 5 years ago

Sorry to spam you all with multiple issues,but what is wrong with my understanding?I have been thinking about these issues lately. What went wrong?@mariogeiger @tscohen @jonas-koehler

tscohen commented 5 years ago

I'm also not sure I understand your question, but I think you want to know how to interpret the position of each activation in the output feature map. ("Can the output of S2 / SO3 convolution have a one-to-one correspondence with the position of the pixel on the original image, regardless of the bandwidth reduction?")

We use the grid defined by Kostelec & Rockmore, 2007: https://www.cs.dartmouth.edu/~geelong/soft/soft20_fx.pdf. At each layer you have a grid of a certain resolution / bandwidth b. This grid has shape 2b2b2b for SO3 and 2b*2b for S2. For SO3, the first two axes correspond to position on the sphere, and the third one corresponds to the orientation of the feature. For S2 both axes code the position. If you have an integer index (i, j, k) for the three axes, this corresponds to three angles alpha, beta, gamma, as defined by Kostelec & Rockmore (defined right below eq 26). From those equations for alpha_j1 beta_k gamma_j2 it should be possible to see if a grid for bandwidth B' < B is a subset of the grid of bandwidth B, which would mean that each point in the grid of resolution B' also exists in the grid of resolution B (but not vice versa, obviously). If the low-res grid is not a subset of the high-res grid, you can still relate each sample in the low-res grid to a position on the sphere, but this position may not be exactly on a grid point of the high-res grid.