lucidrains / meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
MIT License
732 stars 60 forks source link

Mesh conditioning instead of text conditioning #77

Open pathquester opened 6 months ago

pathquester commented 6 months ago

I was wondering if this was discussed before. The idea is to condition on existing meshes rather than text. This would be particularly useful in training it to retopologize existing meshes.

MarcusLoppe commented 6 months ago

Do you mean taking a mesh, then encode it into a vector embedding? Which then you can use to refine to create different versions of it?

It's possible, I don't think the author will do it since he have moved on but if it's possible to do this with the current lib, the text conditioner is just a class with the text embedding model.

So the transformer doesn't have access to the actual "text" and only uses the embedding vector, so in theory it's a easy replacement. You can fork the lib and then create your own embedding model (dirty solution but you can just leave the text related stuff empty), then preprocess the meshes and set the "text_embedding" vector using the mesh encoder. Then the transformer won't know the difference. https://github.com/lucidrains/classifier-free-guidance-pytorch/blob/main/classifier_free_guidance_pytorch/bge.py

However training a model to create a good embedding of a mesh model is another thing. Which I'm not 100% how to even think about.

pathquester commented 6 months ago

Yes, is the current autoencoder a good fit for creating mesh embeddings for this purpose?

MarcusLoppe commented 6 months ago

Yes, is the current autoencoder a good fit for creating mesh embeddings for this purpose?

Kinda, it will encode the mesh to a list of tokens/codes. You can then create some kind of vector of this. It's a lot of information to capture/encode and probably hard for the model to generalize without lots of training.

pathquester commented 6 months ago

Is the face_embed_output that it produces not suitable for this?

MarcusLoppe commented 5 months ago

Is the face_embed_output that it produces not suitable for this?

The encoder will output Fx192 meaning it will create a embedding for each triangle and not the entire mesh. So no. :(

lucidrains commented 5 months ago

what I would recommend is just to encode the prompt and response meshes, and use a separator token in between

will require work to handle the special separator token