Closed MarcusLoppe closed 10 months ago
@MarcusLoppe i don't think these numbers have to be that exact, but i can spend a bit of time at the end and align the hparams a bit more
@MarcusLoppe in the paper, they ensure that the dimension can be split into 3, as they quantize the vertices rather than the faces. to keep the model dimensions separate from having to be neatly divisible, i have an extra projection right before this step here
@MarcusLoppe i don't think these numbers have to be that exact, but i can spend a bit of time at the end and align the hparams a bit more
I disagree, I implemented the encoder as per below and the loss improved.
Since I'm unsure about the ReLU implementation and I wanted to test it quickly I didn't implement that nor the ResNet34. But the loss was reduced very much plus the encoder's parameters count when from 5 978 880 to 725 140 so this means faster training. Since the GCN is smaller it also means it has a smaller "compression space", this might be downside when dealing with much more complex 3D mesh. But since the paper seems to manage to generate good shapes, it might not be a issue.
self.encoders = ModuleList([
SAGEConv(196,64,**sageconv_kwargs),
SAGEConv(64,128,**sageconv_kwargs),
SAGEConv(128,256,**sageconv_kwargs),
SAGEConv(256,256,**sageconv_kwargs),
SAGEConv(256,576,**sageconv_kwargs)
])
oh dang, shot down with evidence 😂
I'll look into it next Monday, have a great weekend
@MarcusLoppe yes, you are right, i missed the little boxes in the decoder diagram x{num}
. the resnet in the decoder is way deeper than what i had before
@MarcusLoppe yes, you are right, i missed the little boxes in the decoder diagram
x{num}
. the resnet in the decoder is way deeper than what i had before
You did a bit of spelling mistake here, it should be 192 not 912 :)
decoder_dims_through_depth: Tuple[int, ...] = ( 128, 128, 128, 128, 192, 192, 912, 192, 256, 256, 256, 256, 256, 256, 384, 384, 384 ),
@MarcusLoppe 🤦 thanks!
In the paper they implement a SageConv and Resnet 34, at the end of the paper they show the model architecture with the different in/out sizes. All the SageConv & ResNets are the same sizes here so is there any reason why this wasn't mirrored?
There is some reasons for why it's best to mirror the sizes:
Optimization, since they people that wrote the paper probably have experimented with different sizes and found what worked best.
The input for the embedding seems to be at F x 196 , would this mean that each face gets a 196 tensor? I'm confused about this since currently it project_in has the feature values: Linear(in_features=832, out_features=512, bias=True) Since there are 16 features and 196 / 16 = 12,5 which seems very low. Maybe you can figure out what this means :)
Another thing that the implementation might be missing. According to the paper they sort the vertices in z-y-x order. Then sort the faces as per their lowest vertex index. I think this functionality belongs to the dataset class but anyway, I just wanted to highlight it.