SageConv & ResNet sizes

MarcusLoppe commented 10 months ago

In the paper they implement a SageConv and Resnet 34, at the end of the paper they show the model architecture with the different in/out sizes. All the SageConv & ResNets are the same sizes here so is there any reason why this wasn't mirrored?

There is some reasons for why it's best to mirror the sizes:

In the paper they explain the reason for the 192 dim codebook size, the output of the SageConv is 576, and dividing 576/192 = 3. Currently the output of the SageConv is the same as the dim (e.g 512) which then goes into project_dim_codebook (Linear) which has a out size of 576. By using 512 as dim size (512 / 192 = 2,6666666) for the out for the SageConv, it might cause some ineffectiveness since the SageConv might be able to correlate better than the Linear layer.

bild

Optimization, since they people that wrote the paper probably have experimented with different sizes and found what worked best.
The input for the embedding seems to be at F x 196 , would this mean that each face gets a 196 tensor? I'm confused about this since currently it project_in has the feature values: Linear(in_features=832, out_features=512, bias=True) Since there are 16 features and 196 / 16 = 12,5 which seems very low. Maybe you can figure out what this means :)

bild

Another thing that the implementation might be missing. According to the paper they sort the vertices in z-y-x order. Then sort the faces as per their lowest vertex index. I think this functionality belongs to the dataset class but anyway, I just wanted to highlight it.

lucidrains commented 10 months ago

@MarcusLoppe i don't think these numbers have to be that exact, but i can spend a bit of time at the end and align the hparams a bit more

lucidrains commented 10 months ago

@MarcusLoppe in the paper, they ensure that the dimension can be split into 3, as they quantize the vertices rather than the faces. to keep the model dimensions separate from having to be neatly divisible, i have an extra projection right before this step here

MarcusLoppe commented 10 months ago

@MarcusLoppe i don't think these numbers have to be that exact, but i can spend a bit of time at the end and align the hparams a bit more

I disagree, I implemented the encoder as per below and the loss improved.

Since I'm unsure about the ReLU implementation and I wanted to test it quickly I didn't implement that nor the ResNet34. But the loss was reduced very much plus the encoder's parameters count when from 5 978 880 to 725 140 so this means faster training. Since the GCN is smaller it also means it has a smaller "compression space", this might be downside when dealing with much more complex 3D mesh. But since the paper seems to manage to generate good shapes, it might not be a issue.

        self.encoders = ModuleList([
            SAGEConv(196,64,**sageconv_kwargs),
            SAGEConv(64,128,**sageconv_kwargs),
            SAGEConv(128,256,**sageconv_kwargs),
            SAGEConv(256,256,**sageconv_kwargs),
            SAGEConv(256,576,**sageconv_kwargs)
        ])

bild

lucidrains commented 10 months ago

oh dang, shot down with evidence 😂

I'll look into it next Monday, have a great weekend

lucidrains commented 10 months ago

@MarcusLoppe yes, you are right, i missed the little boxes in the decoder diagram x{num}. the resnet in the decoder is way deeper than what i had before

MarcusLoppe commented 10 months ago

@MarcusLoppe yes, you are right, i missed the little boxes in the decoder diagram x{num}. the resnet in the decoder is way deeper than what i had before

You did a bit of spelling mistake here, it should be 192 not 912 :)

decoder_dims_through_depth: Tuple[int, ...] = ( 128, 128, 128, 128, 192, 192, 912, 192, 256, 256, 256, 256, 256, 256, 384, 384, 384 ),

lucidrains commented 10 months ago

@MarcusLoppe 🤦 thanks!

lucidrains / meshgpt-pytorch

SageConv & ResNet sizes #13