lucidrains / meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
MIT License
744 stars 59 forks source link

SageConv & ResNet sizes #13

Closed MarcusLoppe closed 10 months ago

MarcusLoppe commented 10 months ago

In the paper they implement a SageConv and Resnet 34, at the end of the paper they show the model architecture with the different in/out sizes. All the SageConv & ResNets are the same sizes here so is there any reason why this wasn't mirrored?

There is some reasons for why it's best to mirror the sizes:

bild

bild

Another thing that the implementation might be missing. According to the paper they sort the vertices in z-y-x order. Then sort the faces as per their lowest vertex index. I think this functionality belongs to the dataset class but anyway, I just wanted to highlight it.

lucidrains commented 10 months ago

@MarcusLoppe i don't think these numbers have to be that exact, but i can spend a bit of time at the end and align the hparams a bit more

lucidrains commented 10 months ago

@MarcusLoppe in the paper, they ensure that the dimension can be split into 3, as they quantize the vertices rather than the faces. to keep the model dimensions separate from having to be neatly divisible, i have an extra projection right before this step here

MarcusLoppe commented 10 months ago

@MarcusLoppe i don't think these numbers have to be that exact, but i can spend a bit of time at the end and align the hparams a bit more

I disagree, I implemented the encoder as per below and the loss improved.

Since I'm unsure about the ReLU implementation and I wanted to test it quickly I didn't implement that nor the ResNet34. But the loss was reduced very much plus the encoder's parameters count when from 5 978 880 to 725 140 so this means faster training. Since the GCN is smaller it also means it has a smaller "compression space", this might be downside when dealing with much more complex 3D mesh. But since the paper seems to manage to generate good shapes, it might not be a issue.

        self.encoders = ModuleList([
            SAGEConv(196,64,**sageconv_kwargs),
            SAGEConv(64,128,**sageconv_kwargs),
            SAGEConv(128,256,**sageconv_kwargs),
            SAGEConv(256,256,**sageconv_kwargs),
            SAGEConv(256,576,**sageconv_kwargs)
        ])   

bild

lucidrains commented 10 months ago

oh dang, shot down with evidence 😂

I'll look into it next Monday, have a great weekend

lucidrains commented 10 months ago

@MarcusLoppe yes, you are right, i missed the little boxes in the decoder diagram x{num}. the resnet in the decoder is way deeper than what i had before

MarcusLoppe commented 10 months ago

@MarcusLoppe yes, you are right, i missed the little boxes in the decoder diagram x{num}. the resnet in the decoder is way deeper than what i had before

You did a bit of spelling mistake here, it should be 192 not 912 :)

decoder_dims_through_depth: Tuple[int, ...] = ( 128, 128, 128, 128, 192, 192, 912, 192, 256, 256, 256, 256, 256, 256, 384, 384, 384 ),

lucidrains commented 10 months ago

@MarcusLoppe 🤦 thanks!