lucidrains / meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
MIT License
700 stars 57 forks source link

Is there a pretrained model and if not, how to train the model #56

Closed jonasweimar closed 6 months ago

jonasweimar commented 7 months ago

Hello @lucidrains, @MarcusLoppe,

We are trying to use it for a quick university project and we are not sure on how to train the model using shapenet or similar datasets. Could you help us with this? The goal we have is to be able to prove text to a flask server which utilizes your transformer to generate a model for us, which is then returned.

Is there any pretrained Version that can be used with this? How exactly does one use MeshTransformerTrainer and MeshAutoencoderTrainer in combination with Shapenet? etc... As it seems you guys have trained the model already on a broad amount of categories from the ShapeNet dataset, could you provide us with the state file, so that we can load a trained model into our version?

Regards

MarcusLoppe commented 7 months ago

@jonasweimar

Hi, About how to train see my notebook and here is more extra details about the training Training guide

Regarding the pre trained, I might can provide you with something that can just generate a few different models over 3-6 categories, but they'll be some what limited due to the training dataset since I have not trained it on some good GPU's and have only had access to the free kaggle GPU. If you'd like bit more 'complex' mesh variations it can generate I can whip something up that but that would just be for 1 or max 2 categories.

I've trained it successfully to generate complex meshes but that is only by overtraining it on one mesh, the more complex and broad meshes it can generate requires it to train for longer duration on some better hardware. I have not resources for this but you're welcome to try by using the shapenet dataset using CSV labels and the 3D mesh models shapenet v2 GLB

jonas-weimar commented 7 months ago

@MarcusLoppe thank you for your quick response! We will have a try in training it on a larger amount of categories with the resources we have been provided by the university. Still we would like to accept your offer for providing the pretrained model, you mentioned :)

Best Regards!!

nicolasdonati commented 7 months ago

Hi all, I would also be quite interested in some pre-trained models even on some toy dataset just to get a better understanding of how things work, thanks for all this @MarcusLoppe !

Best,

jonasweimar commented 7 months ago

Hi @MarcusLoppe, how could you provide us with the trained model?

MarcusLoppe commented 7 months ago

Hi guys,

So I've mostly testing different way of training in regards to dataset, model parameters and hyper parameters.
I've managed to train a few to generate descent mesh but not great, so now I'm running a training run that hopefully will produce some good models. I only have access to the free GPU at kaggle so it's not going very fast.

lucidrains commented 7 months ago

@MarcusLoppe you should really sell your skillset (yes being able to cook up a model is a skill, not commoditized yet). at least get some non-technical founder to give you some compute

adeerAI commented 7 months ago

Hi @MarcusLoppe,

Thanks for all the help. I was curious for the amount of Epochs to run for a decent mesh generation model. And, are there any suggestions from your side regarding the epochs, train steps, etc. Would be really glad, because I have been testing a transformer, with 7000 models, but the loss decrease is very slow for epochs.

jysung commented 7 months ago

@MarcusLoppe

I am also interested in pre-trained model once you have it. Is there anything in this repo that diverged from the original paper?

MarcusLoppe commented 7 months ago

Hi @MarcusLoppe,

Thanks for all the help. I was curious for the amount of Epochs to run for a decent mesh generation model. And, are there any suggestions from your side regarding the epochs, train steps, etc. Would be really glad, because I have been testing a transformer, with 7000 models, but the loss decrease is very slow for epochs.

Oh, I have only trained with 600 models * 50 augments so I'm unsure if you would need to increase the codebook size, you can check this using the auto encoder, generate the codes for a mesh and then let it decode it and reconstruct the mesh. If it's very ugly and the loss at end of training is lower then 0.4 then you might need to increase the codebook (maybe to 32k-64k vocab).

But the training loss required for good text-to-mesh might be around 0.01-0.001 when dealing with smaller datasets, but since you got 7k models you won't need that low loss since hopefully it have learned to generalize.

Another issuse with the transformer is that the first few tokens isn't guided very well by the text embedding, after the first few it will be guided very well by the text.

So if you provide it maybe 10% of its total amount of tokens for a model and then let it generate, is the result dramatically better? (Check out the end of my notebook for the code required for this).

Let me know if this is the case, I'm thinking of training a separate model on the first few tokens to resolve this.

MarcusLoppe commented 7 months ago

@MarcusLoppe you should really sell your skillset (yes being able to cook up a model is a skill, not commoditized yet). at least get some non-technical founder to give you some compute

Maybe :) Honestly I just like training models and experimenting on how to improve the training. Especially in 3D , so many possibilities if you can create a very good spatial model, the world is in 3D so almost all ML solutions that needs interacting in the real world would benefit of this.

Let me know if you stubble upon any compute and I'll be happy to take it off your hands 😊

adeerAI commented 7 months ago

@MarcusLoppe Thanks for the reply. Yes, I agree using 10% of tokens results in much better mesh generation, it follows the text embeddings much better. I actually have 450 models but augmented them to 7000. Right now, I am testing with 30000 models. Do you mean that after inferencing transformer and generating models from labels, it eventually starts generating better results? Because at most I run the inference for the first 4-5 models, check the results and then start contemplating on the encode and transformer settings.

MarcusLoppe commented 7 months ago

@MarcusLoppe Right now, I am testing with 30000 models. Do you mean that after inferencing transformer and generating models from labels, it eventually starts generating better results? Because at most I run the inference for the first 4-5 models, check the results and then start contemplating on the encode and transformer settings.

I'm not sure but that is the hope 😄

@MarcusLoppe Thanks for the reply. Yes, I agree using 10% of tokens results in much better mesh generation, it follows the text embeddings much better.

If scaling up the dataset it trains on does not resolve this issue you might need to train another transformer on just 60 tokens (by cutting of the codes length) per model. It's super fast to train on just 60 tokens so it should go pretty quickly till you get good loss.

It might be janky solution but using one transformer that is specialized generating the the start of a model, it will have much better guidance from the text. Then you can use the tokens generated from the specialized model to prompt the other transformer.

Since the mesh is ordered from the bottom to the top, the specialized transformer it will become a expert on generating the 'ground' or base of the mesh. Then the transformer trained on the entire length will be able to generate better quality meshes since it knows the base of the mesh.

This will hopefully reduce the training times since you don't need as low loss rate to generate consistent meshes.