leohku / faceformer-emo

FaceFormer Emo: Speech-Driven 3D Facial Animation with Emotion Embedding
MIT License
22 stars 1 forks source link

Pretrained model? #3

Open HarryXD2018 opened 1 year ago

HarryXD2018 commented 1 year ago

Nice work! The demo video looks awesome. Would you please share the pretrained model? That would help me a lot!

leohku commented 1 year ago

Hi! Thanks for your interest - can you please reach out at <redacted>? Thanks!

ElliottDyson commented 9 months ago

Hi! Thanks for your interest - can you please reach out at <redacted>? Thanks!

Hello, sorry, where may I reach out? I'd love to have access to your pre-trained model if possible?

It's for integrating with an open source project that we're working on with fully locally run, machine learning oriented, unreal engine 5 metahumans.

Open source LLMs and vector database system we've been developing for the decision making and conversations, tortoise for TTS, and hopefully your work for the facial animation of the metahumans.

HarryXD2018 commented 9 months ago

Hi! Thanks for your interest - can you please reach out at <redacted>? Thanks!

Hello, sorry, where may I reach out? I'd love to have access to your pre-trained model if possible?

It's for integrating with an open source project that we're working on with fully locally run, machine learning oriented, unreal engine 5 metahumans.

Open source LLMs and vector database system we've been developing for the decision making and conversations, tortoise for TTS, and hopefully your work for the facial animation of the metahumans.

@ElliottDyson hello, I am kind of interested in your project. Wonder how the pretrained model will benefit your project with incompatible format, are you going to finetune it? Many thanks if you can share your solution.

ElliottDyson commented 9 months ago

Hi! Thanks for your interest - can you please reach out at <redacted>? Thanks!

Hello, sorry, where may I reach out? I'd love to have access to your pre-trained model if possible?

It's for integrating with an open source project that we're working on with fully locally run, machine learning oriented, unreal engine 5 metahumans.

Open source LLMs and vector database system we've been developing for the decision making and conversations, tortoise for TTS, and hopefully your work for the facial animation of the metahumans.

@ElliottDyson hello, I am kind of interested in your project. Wonder how the pretrained model will benefit your project with incompatible format, are you going to finetune it? Many thanks if you can share your solution.

The idea we've had for what could be done is to figure out how to best map (or pick) the vertices from your model onto the "bones" provided in metahuman. Some simple scaling and other transformations might be required, but we figure it shouldn't need to be anything more than that. A fine-tune might indeed be required, but if that is the case, we'll figure that out when we get there.

Edit: sorry, I thought you were the author, and had forgotten it gives that label to the author of the post, not of the repository. The above reply has been amended.

leohku commented 9 months ago

Hi there, I'm honored that you're interested in my work - however since this work there has been many better audio2face models that gives better lip accuracy with faster inference speed. Unless you are explicitly looking for good emotion variations at the expense of lip accuracy (which FaceFormer Emo is good at), I would suggest you look at other models, such as DiffSpeaker or CodeTalker. Both models use the same data formats as mine, so it should be relatively similar in difficulty to adapt to your use case.

ElliottDyson commented 9 months ago

Hi there, I'm honored that you're interested in my work - however since this work there has been many better audio2face models that gives better lip accuracy with faster inference speed. Unless you are explicitly looking for good emotion variations at the expense of lip accuracy (which FaceFormer Emo is good at), I would suggest you look at other models, such as DiffSpeaker or CodeTalker. Both models use the same data formats as mine, so it should be relatively similar in difficulty to adapt to your use case.

Funnily enough, emotional expression is more important than lip accuracy for what we're working with. Thanks for the links to the other two projects though, I'll take a peek anyways.