lucidrains / meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
MIT License
700 stars 57 forks source link

Mesh intra face vertex id ordering convention #62

Closed jrmgr closed 5 months ago

jrmgr commented 6 months ago

Hi, Thank you for your great implementation of the meshGPT paper. I have a question related to the section 3.1 : "For sequence ordering, Polygen [43] suggests a convention where faces are ordered based on their lowest vertex index, followed by the next lowest, and so forth. Vertices are sorted in z-y-x order (z representing the vertical axis), progressing from lowest to highest. Within each face, indices are cyclically permuted to place the lowest index first."

@MarcusLoppe , I used your vertices and faces ordering function (available in your repo notebook) for my own data preparation. I have the feeling the vertices ordering is performed as in the paper. However, your faces ordering seems not.

Let's take the following toy example face with given vertices IDs: [3, 2, 8] On the one hand, according to the paper, they cyclically permute the IDs to place the lowest index first, it gives [3, 2, 8] -> [8, 3, 2] -> [2, 8, 3] after two cyclic permutations or even better [3, 2, 8] -> [2, 8, 3] with one cyclic permutation in the other direction. On the other hand, you permute the IDs like this: [3, 2, 8] -> [2, 3, 8] which can not be given by a cyclic permutation.

I think it can cause further issues as you can see in the attached image of my own data. Since your permutation may not be cyclic, it can induce the triangles normals to be inverted. The triangles angle values may also be different if they are oriented but I did not check. And since these features are computed inside the model and fed to the GCNN encoder, I would say we are not forwarding the right mesh to the model.

To conclude, this is not an issue related to the model itself but rather on the data preparation for training and how to better stick to the existing paper.

What do you think? Thanks! face_ordering_meshgpt

MarcusLoppe commented 6 months ago

Hello,

At the start I did not reorder the indices within the triangles but when I reordered by lowest index I did not notice any improvements, so I concluded that the vertex order doesn't really matter that much.

But it seems like I might not have understood what cyclically permutation is since I just ordered by lowest index. Could you provide a code snippet for this so I might test this out?

The autoencoder seems to be remarkable good at reconstructing no matter the order nor if I the mesh isn't "standing" on the ground (e.g I forgot to the lowest vertical value to -0.95 so they where hovering in the centre).

The biggest issue is that it's getting a low loss, but it's doing that by getting a very low loss on 90% of the meshes and then catastrophe mess up 10% of them. It's almost like catastrophe forgetting. I'm currently seeing if using local attention will resolve this issue.

Could maybe take a look at this mesh and see if you spot any of the issues you are talking about? I took 40 random samples in a dataset and generate the codes for them, then reconstructed the mesh using those codes. (no transformer)

https://www.mediafire.com/file/4iy1hri8ycqgrk5/autoencoder_output.obj/file

bild

jrmgr commented 6 months ago

Hello, Thanks for your answer. I had the same experience related to the faces' indices ordering i.e. it does not improve the results. I might be missing something but I do not see the point to do it. But at least if we do reorder the indices to have the lowest index first, it must be a cyclic permutation as in the paper, otherwise the faces orientation (either clockwise or counter clockwise) will be inverted. I have the same issue with the meshes you have shared. As you can see in the screenshots (meshlab visualisation), some faces have been reconstructed with inconsistent faces orientation. Your visualisation (Blender?) seems to be ok though. These wrong orientations might be fixed within Meshlab (Filters->Normals, Curvatures and Orientation->Re-Orient All Faces Coherently) or with trimesh in-place modifier function trimesh.repair.fix_winding(mesh). meshgpt_faces_orientation By the way, I used this code snippet to stick to the paper faces' indices reordering convention. I simply replaced your ascending order reordering - which might be non cyclic as we said - in your notebook: sorted_faces = [sorted(sub_arr) for sub_arr in reindexed_faces] by: sorted_faces = [sub_arr[sub_arr.index(min(sub_arr)):] + sub_arr[:sub_arr.index(min(sub_arr))] for sub_arr in reindexed_faces] It ensures a cyclic permutation which will further preserve both the mesh's face orientation and the lowest index first.

MarcusLoppe commented 6 months ago

@jrmgr

Hiya again, Could you maybe take a look at the below, I implemented your code and it seems like it worked. However since I used data from both ModelNet and ShapeNet it seems like only the mesh samples from the shapenet meshes had their faces correctly aligned. I think the models that are facing sideways are the ModelNet. It might be better just to order non cyclic since not all the meshes are ordered correct/same from the start.

It might have been also due to the modelnet uses the Z for the vertical so I had to switch Z and Y when I loaded the modelnet meshes. Do you maybe have a better way permuting cyclic when dealing with two different axis and since I need to switch the Z with Y?

if ".off" in file_path: 
  for vertex in vertices:
      vertex[1], vertex[2] = vertex[2], vertex[1]

https://file.io/Mpg7AoUYoBgC (the mse_rows(63) contains the original model plus the reconstructed) bild

jrmgr commented 6 months ago

Hi, Thanks for your investigation. Looking at the obj files you shared, I see 3 patterns:

I would have envisaged the cyclic permutation I proposed as universal since the sense (either clockwise or counter clockwise) in which we count the vertex indices remains the same with cyclic permutation and so I thought the faces orientation should remain the same as the original/ground truth data. However if it is not the case I would say the user has to be careful on the data he is feeding to the networks (autoencoder and transformer) while preparing his data to avoid the typical AI "shit in shit out" effect.

By the way, I would be interested to know how you manage to reconstruct the model output to a trimesh (for example) like object since I see you dump it as obj file on your disk. In other words, how do you convert from [b, n_f, 3, 3] i.e. meshgpt output to mesh object with float vertices [b, n_v, 3] and int faces [b, n_f, 3]? Inspired by your notebbok code to convert one element in a batch, I went like :

meshgpt_to_trimesh(meshgpt_tensor): vertices = meshgpt_tensor.cpu().numpy().reshape(-1, 3).tolist() faces = [] for i in range(0, len(vertices) - 1, 3): faces.append([i, i + 1, i + 2]) mesh = trimesh.Trimesh(vertices=vertices, faces=faces) return mesh

My understanding with that type of conversion is this face index creation will create duplicated vertices because each face will contain ids different from all other faces ids that is to say if f1 = [0, 1, 2] and f2 = [3, 4, 5] are adjacent faces, 1 and 2 AND 3 and 4 respectively may represent the same x, y, z vertices. Otherwise, it means no faces will be adjancent which basically leads to mesh with holes. Am I understanding it correctly? If yes, it means we have to post process the trimesh mesh object to remove the duplicated vertices. Moreover, it means the model must learn to generate the same x, y, z vertices float values for adjacent faces.

adeerkhan commented 6 months ago

@MarcusLoppe Hi, is transfer learning possible with MeshGPT? Is it possible in the code, I mean has it been implemented so far in the code? Thanks! PS: What could be this problem while generating the 3d objects for example, they don't seem to complete them, happens in all of the generation cells.: 23%|██▎ | 2025/8679 [00:21<01:10, 94.69it/s] 77%|███████▋ | 6690/8664 [01:10<00:20, 94.92it/s] 5%|▍ | 389/8627 [00:04<01:27, 93.80it/s]

MarcusLoppe commented 6 months ago

Hi, Thanks for your investigation. Looking at the obj files you shared, I see 3 patterns:

* All faces well oriented (grey case)

* All faces badly oriented (black cases)

* Some faces well oriented, some others not (mix of black and grey cases)

I would have envisaged the cyclic permutation I proposed as universal since the sense (either clockwise or counter clockwise) in which we count the vertex indices remains the same with cyclic permutation and so I thought the faces orientation should remain the same as the original/ground truth data. However if it is not the case I would say the user has to be careful on the data he is feeding to the networks (autoencoder and transformer) while preparing his data to avoid the typical AI "shit in shit out" effect.

The faces that badly oriented is from the ModelNet and the others from the ShapeNet dataset.
I'd imagine that it's probably easier just to skip the permutation when dealing with meshes from other sources, for example the Objverse dataset uses meshes from all kinds of users so there would be no expected standard. By sorting the vertices indices without cyclic permutation will provide some sort of data standard and provides a pattern on how to the vertices are created.

Here is the same kind of data but using my old method, I can't see any different when it's rendered without the edges.

bild bild

https://easyupload.io/wad6po

By the way, I would be interested to know how you manage to reconstruct the model output to a trimesh (for example) like object since I see you dump it as obj file on your disk. In other words, how do you convert from [b, n_f, 3, 3] i.e. meshgpt output to mesh object with float vertices [b, n_v, 3] and int faces [b, n_f, 3]? Inspired by your notebbok code to convert one element in a batch, I went like :

meshgpt_to_trimesh(meshgpt_tensor): vertices = meshgpt_tensor.cpu().numpy().reshape(-1, 3).tolist() faces = [] for i in range(0, len(vertices) - 1, 3): faces.append([i, i + 1, i + 2]) mesh = trimesh.Trimesh(vertices=vertices, faces=faces) return mesh

My understanding with that type of conversion is this face index creation will create duplicated vertices because each face will contain ids different from all other faces ids that is to say if f1 = [0, 1, 2] and f2 = [3, 4, 5] are adjacent faces, 1 and 2 AND 3 and 4 respectively may represent the same x, y, z vertices. Otherwise, it means no faces will be adjancent which basically leads to mesh with holes. Am I understanding it correctly? If yes, it means we have to post process the trimesh mesh object to remove the duplicated vertices. Moreover, it means the model must learn to generate the same x, y, z vertices float values for adjacent faces.

I'm not 100% following but the reason why I output the mesh using probably duplicate vertices is so I can get a true mesh generation and can see if there is any issues, If i were to smooth the vertices and remove those that are very close to each other (since the decoder might be 0.0001 off) , I won't be able to spot any issues.

It might be useful to round the vertices to 0.001 and remove/combine the vertices which are too close to get a smoother mesh but I'm not at that stage yet.

Well since the idea with the project is to train a model to understand how to generate mesh there will be inherently no holes when it's fully trained since the training data doesn't have broken mesh with holes in them.

I actually haven't had any problems with holes, the problem has mostly been that the transformer generates the wrong codes and there will be spikes. But the autoencoder seems to be pretty consistent with generating meshes that are fully connected together.

MarcusLoppe commented 6 months ago

@MarcusLoppe Hi, is transfer learning possible with MeshGPT? Is it possible in the code, I mean has it been implemented so far in the code? Thanks! PS: What could be this problem while generating the 3d objects for example, they don't seem to complete them, happens in all of the generation cells.: 23%|██▎ | 2025/8679 [00:21<01:10, 94.69it/s] 77%|███████▋ | 6690/8664 [01:10<00:20, 94.92it/s] 5%|▍ | 389/8627 [00:04<01:27, 93.80it/s]

With transfer learning, do you mean teacher-student? If so no, since the the codebook will be very different between them so they won't speak the same language, it's better to just fine-tune/retrain the same model since it will still carry over some generalizations.

The transformer appends a EOS token when training as the last token in a sequence, this token means 'stop generation'. Otherwise when generating the meshes it might use too many tokens for something simple. So it's totally normal behaviour :)

adeerkhan commented 6 months ago

@MarcusLoppe Yes fine-tune/retrain a model checkpoint, lets say I train a model on Objaverse dataset, and then I fine-tune it on a specific other 3d models that I want it to learn. Is there a functionality in the code of that sort or in your notebook? Thanks

MarcusLoppe commented 6 months ago

@MarcusLoppe Yes fine-tune/retrain a model checkpoint, lets say I train a model on Objaverse dataset, and then I fine-tune it on a specific other 3d models that I want it to learn. Is there a functionality in the code of that sort or in your notebook? Thanks

Fine-tune/pre-train and train does functionally the same thing, it just the dataset size and purpose that is changing. When you pre-train a model you use a large dataset with vast knowledge, then you fine-tune it by using a smaller dataset which contains what you actually want it to do.

So the code is actually the same for all the stages, so just train the autoencoder and then the transformer on your other dataset as you first did, you might want to use a lower learning rate such as 1e-4 . However since the transformer uses a max length , your new dataset cannot have longer sequences then what you specified in the parameters for the transformer creation.

adeerkhan commented 6 months ago

I've been training on a dataset, but since your code has changed now it asks me to switch version from 1.10 to 1.02. And, your fork is not using version set, and I can't use Luciddrains original repo since I follow your fork. So any know how?

Also, can you elaborate on the fine-tune/pre-train more as to what steps we necessarily take? You mention that we train the autoencoder for the large dataset (objaverse), and then train the transformer for the small dataset (fine-tune one). Is this how you saying? Thanks

MarcusLoppe commented 6 months ago

I've been training on a dataset, but since your code has changed now it asks me to switch version from 1.10 to 1.02. And, your fork is not using version set, and I can't use Luciddrains original repo since I follow your fork. So any know how?

Oh, that is just a warning , it let's you know so you know if the model won't load that there might some compatibility issues. There haven't been change major changes for a while expect the pixel norm so it's probably fine.

Also, can you elaborate on the fine-tune/pre-train more as to what steps we necessarily take? You mention that we train the autoencoder for the large dataset (objaverse), and then train the transformer for the small dataset (fine-tune one). Is this how you saying? Thanks

I'm not quite sure since I've always trained on the same dataset. But I would fine-tune the autoencoder on your dataset and then train the transformer from the start using the same dataset. It would be ideal to freeze the encoder weights but that feature is not implemented (yet?) so it uses the same language but it just gets better at predicting the encoder outputs and not losing to much "world" knowledge.

In the paper they pre-train both on the entire dataset, then they fine-tune the autoencoder and then the transformer. Will way they both will get the generalization ability and then expert knowledge. However I'm not 100% convinced on this since the transformer will be fine-tuned on tokens which then have another meaning then it had when it was pre-trained (e.g another "language") which meaning/usage that might not even be close to the tokens it trained on before.

adeerkhan commented 5 months ago

I've been training on a dataset, but since your code has changed now it asks me to switch version from 1.10 to 1.02. And, your fork is not using version set, and I can't use Luciddrains original repo since I follow your fork. So any know how?

Oh, that is just a warning , it let's you know so you know if the model won't load that there might some compatibility issues. There haven't been change major changes for a while expect the pixel norm so it's probably fine.

Also, can you elaborate on the fine-tune/pre-train more as to what steps we necessarily take? You mention that we train the autoencoder for the large dataset (objaverse), and then train the transformer for the small dataset (fine-tune one). Is this how you saying? Thanks

I'm not quite sure since I've always trained on the same dataset. But I would fine-tune the autoencoder on your dataset and then train the transformer from the start using the same dataset. It would be ideal to freeze the encoder weights but that feature is not implemented (yet?) so it uses the same language but it just gets better at predicting the encoder outputs and not losing to much "world" knowledge.

In the paper they pre-train both on the entire dataset, then they fine-tune the autoencoder and then the transformer. Will way they both will get the generalization ability and then expert knowledge. However I'm not 100% convinced on this since the transformer will be fine-tuned on tokens which then have another meaning then it had when it was pre-trained (e.g another "language") which meaning/usage that might not even be close to the tokens it trained on before.

This is the error that I get:

Loading saved mesh autoencoder at version 1.0.2, but current package version is 1.1.0

RuntimeError Traceback (most recent call last) in <cell line: 4>() 2 #autoencoder_trainer.load(f'{workingdir}\mesh-encoder{project_name}.pt') 3 ----> 4 autoencoder_trainer.load('/content/drive/MyDrive/MeshGPT Generation/Test 10 March/mesh-encoder_new_40_aug.pt') 5 autencoder = autoencoder_trainer.model 6 for param in autoencoder.parameters():

1 frames /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict, assign) 2150 2151 if len(error_msgs) > 0: -> 2152 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 2153 self.class.name, "\n\t".join(error_msgs))) 2154 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for MeshAutoencoder: Unexpected key(s) in state_dict: "decoders.0.block1.norm.weight", "decoders.0.block1.norm.bias", "decoders.0.block2.norm.weight", "decoders.0.block2.norm.bias", "decoders.1.block1.norm.weight", "decoders.1.block1.norm.bias", "decoders.1.block2.norm.weight", "decoders.1.block2.norm.bias", "decoders.2.block1.norm.weight", "decoders.2.block1.norm.bias", "decoders.2.block2.norm.weight", "decoders.2.block2.norm.bias", "decoders.3.block1.norm.weight", "decoders.3.block1.norm.bias", "decoders.3.block2.norm.weight", "decoders.3.block2.norm.bias", "decoders.4.block1.norm.weight", "decoders.4.block1.norm.bias", "decoders.4.block2.norm.weight", "decoders.4.block2.norm.bias", "decoders.5.block1.norm.weight", "decoders.5.block1.norm.bias", "decoders.5.block2.norm.weight", "decoders.5.block2.norm.bias", "decoders.6.block1.norm.weight", "decoders.6.block1.norm.bias", "decoders.6.block2.norm.weight", "decoders.6.block2.norm.bias", "decoders.7.block1.norm.weight", "decoders.7.block1.norm.bias", "decoders.7.block2.norm.weight", "decoders.7.block2.norm.bias", "decoders.8.block1.norm.weight", "decoders.8.block1.norm.bias", "decoders.8.block2.norm.weight", "decoders.8.block2.norm.bias", "decoders.9.block1.norm.weight", "decoders.9.block1.norm.bias", "decoders.9.block2.norm.weight", "decoders.9.block2.norm.bias", "decoders.10.block1.norm.weight", "decoders.10.block1.norm.bias", "decoders.10.block2.norm.weight", "decoders.10.block2.norm.bias", "decoders.11.block1.norm.weight", "decoders.11.bloc...

MarcusLoppe commented 5 months ago

I've been training on a dataset, but since your code has changed now it asks me to switch version from 1.10 to 1.02. And, your fork is not using version set, and I can't use Luciddrains original repo since I follow your fork. So any know how?

Oh, that is just a warning , it let's you know so you know if the model won't load that there might some compatibility issues. There haven't been change major changes for a while expect the pixel norm so it's probably fine.

Also, can you elaborate on the fine-tune/pre-train more as to what steps we necessarily take? You mention that we train the autoencoder for the large dataset (objaverse), and then train the transformer for the small dataset (fine-tune one). Is this how you saying? Thanks

I'm not quite sure since I've always trained on the same dataset. But I would fine-tune the autoencoder on your dataset and then train the transformer from the start using the same dataset. It would be ideal to freeze the encoder weights but that feature is not implemented (yet?) so it uses the same language but it just gets better at predicting the encoder outputs and not losing to much "world" knowledge. In the paper they pre-train both on the entire dataset, then they fine-tune the autoencoder and then the transformer. Will way they both will get the generalization ability and then expert knowledge. However I'm not 100% convinced on this since the transformer will be fine-tuned on tokens which then have another meaning then it had when it was pre-trained (e.g another "language") which meaning/usage that might not even be close to the tokens it trained on before.

This is the error that I get:

Loading saved mesh autoencoder at version 1.0.2, but current package version is 1.1.0

RuntimeError Traceback (most recent call last) in <cell line: 4>() 2 #autoencoder_trainer.load(f'{workingdir}\mesh-encoder{project_name}.pt') 3 ----> 4 autoencoder_trainer.load('/content/drive/MyDrive/MeshGPT Generation/Test 10 March/mesh-encoder_new_40_aug.pt') 5 autencoder = autoencoder_trainer.model 6 for param in autoencoder.parameters():

1 frames /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict, assign) 2150 2151 if len(error_msgs) > 0: -> 2152 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 2153 self.class.name, "\n\t".join(error_msgs))) 2154 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for MeshAutoencoder: Unexpected key(s) in state_dict: "decoders.0.block1.norm.weight", "decoders.0.block1.norm.bias", "decoders.0.block2.norm.weight", "decoders.0.block2.norm.bias", "decoders.1.block1.norm.weight", "decoders.1.block1.norm.bias", "decoders.1.block2.norm.weight", "decoders.1.block2.norm.bias", "decoders.2.block1.norm.weight", "decoders.2.block1.norm.bias", "decoders.2.block2.norm.weight", "decoders.2.block2.norm.bias", "decoders.3.block1.norm.weight", "decoders.3.block1.norm.bias", "decoders.3.block2.norm.weight", "decoders.3.block2.norm.bias", "decoders.4.block1.norm.weight", "decoders.4.block1.norm.bias", "decoders.4.block2.norm.weight", "decoders.4.block2.norm.bias", "decoders.5.block1.norm.weight", "decoders.5.block1.norm.bias", "decoders.5.block2.norm.weight", "decoders.5.block2.norm.bias", "decoders.6.block1.norm.weight", "decoders.6.block1.norm.bias", "decoders.6.block2.norm.weight", "decoders.6.block2.norm.bias", "decoders.7.block1.norm.weight", "decoders.7.block1.norm.bias", "decoders.7.block2.norm.weight", "decoders.7.block2.norm.bias", "decoders.8.block1.norm.weight", "decoders.8.block1.norm.bias", "decoders.8.block2.norm.weight", "decoders.8.block2.norm.bias", "decoders.9.block1.norm.weight", "decoders.9.block1.norm.bias", "decoders.9.block2.norm.weight", "decoders.9.block2.norm.bias", "decoders.10.block1.norm.weight", "decoders.10.block1.norm.bias", "decoders.10.block2.norm.weight", "decoders.10.block2.norm.bias", "decoders.11.block1.norm.weight", "decoders.11.bloc...

Ah yes, so that is a issue in regards to pytorch. So I encounter the same issue when I train on Kaggle which uses CUDA 12.1 and If i load it on my computer which uses CUDA 11.6. The save and loading uses pytorch own libraries so it's not something I believe you can fix within this project code.

But check what pytorch & CUDA version it was created in and then update/downgrade and it will resolve itself. (Run "nvcc --version" for getting the CUDA version)

adeerkhan commented 5 months ago

@MarcusLoppe Downgrading CUDA is the hardest thing to do actually. Let me try! Thanks! Can you share the link of Objaverse Data preparation code or any related links on Kaggle, etc. I have resources to train on GPU's. I can also provide you insights from my Objaverse training data if it goes well and the model results, etc. Much thanks for the continued support.

MarcusLoppe commented 5 months ago

@MarcusLoppe Downgrading CUDA is the hardest thing to do actually. Let me try! Thanks! Can you share the link of Objaverse Data preparation code or any related links on Kaggle, etc. I have resources to train on GPU's. I can also provide you insights from my Objaverse training data if it goes well and the model results, etc. Much thanks for the continued support.

Oh great, that sounds awesome :) Since I'm limited to 15 GB VRAM I can only use about 14k models using x15 augments, you probably can get better result then me if you use x50 augmentations per model.

To download the objaverse dataset, you can use my Objaverse-downloader , instead of downloading the entire 8.9TB dataset which includes like 220MB mesh models, you can filter by size instead with Objaverse-downloader. Using 40kb (around 400 faces) as the max limit you can get 37k models If i recall correct, this results in 13.5k models with faces under 250.

I implemented the load_objverse function in my MeshGPT_demo notebook, for the labels it loads the metadata that was exported in Objverse_downloader. The labels in objverse is pretty bad but at least it might give the transformer some kind of hint or guide.

def load_objverse(directory, variations ):
    obj_datas = []     
    id_info = {}   
    with open('./objaverse/metadata.json' , 'r') as f:
        id_info = json.load(f) 

A little tip for the autoencoder; the attention layer actually makes the training faster when dealing with loads of data, so try setting them to about 4 and 8 as per below. I reached to about 0.6 loss after 5 epochs (@ 2hr per epoch) @ 14k models dataset. Without the attention layers it took about 20hrs+.

num_layers = 23 
autoencoder = MeshAutoencoder(     
        decoder_dims_through_depth =  (128,) * 3 + (192,) * 4 + (256,) * num_layers + (384,) * 3,   
        dim_codebook = 192, 
        codebook_size = 16384, 
        dim_area_embed = 16,
        dim_coor_embed = 16, 
        dim_normal_embed = 16,
        dim_angle_embed = 8,

        attn_decoder_depth  = 4,
        attn_encoder_depth = 8
).to("cuda")    
adeerkhan commented 5 months ago

I noticed that you updated the repo with proper dataset links and elaborated the Readme file. I am going to try this weekend and start a training process. I'll let you know what results I gain. Much Thanks!