miccunifi / ScanTalk

[ECCV 2024] ScanTalk: 3D Talking Heads from Unregistered Scans
https://fedenoce.github.io/scantalk/
Other
8 stars 0 forks source link
3d-animation 3d-computer-vision 3d-face 3d-scan audio-driven-talking-face heat-diffusion talking-head

ScanTalk (ECCV 2024)

ScanTalk: 3D Talking Heads from Unregistered Scans

arXiv Generic badge GitHub Stars

This is the official repository of the ECCV 2024 paper "ScanTalk: 3D Talking Heads from Unregistered Scans" by Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Stefano Berretti, Mohamed Daoudi.

🔥🔥 [2024/09/10] Our code is now public available! Feel free to explore, use, and contribute! 🔥🔥

Overview

Speech-driven 3D talking heads generation has emerged as a significant area of interest among researchers, presenting numerous challenges. Existing methods are constrained by animating faces with fixed topologies, wherein point-wise correspondence is established, and the number and order of points remains consistent across all identities the model can animate. In this work, we present ScanTalk, a novel framework capable of animating 3D faces in arbitrary topologies including scanned data. Our approach relies on the DiffusionNet architecture to overcome the fixed topology constraint, offering promising avenues for more flexible and realistic 3D animations. By leveraging the power of DiffusionNet, ScanTalk not only adapts to diverse facial structures but also maintains fidelity when dealing with scanned data, thereby enhancing the authenticity and versatility of generated 3D talking heads. Through comprehensive comparisons with state-of-the-art methods, we validate the efficacy of our approach, demonstrating its capacity to generate realistic talking heads comparable to existing techniques. While our primary objective is to develop a generic method free from topological constraints, all state-of-the-art methodologies are bound by such limitations.

assets/teaser.png We present ScanTalk, a deep learning architecture to animate any 3D face mesh driven by a speech. ScanTalk is robust enough to learn on multiple unrelated datasets with a unique model, whilst allowing us to infer on unregistered face meshes.

assets/teaser.png ScanTalk is a novel Encoder-Decoder framework designed to dynamically animate any 3D face based on a spoken sentence from an audio file. The Encoder integrates the 3D neutral face $m_i^n$, per-vertex surface features $P_i^{n}$ (crucial for DiffusionNet and precomputed by the operators $OP$), and the audio file $A_i$, yielding a fusion of per-vertex and audio features. These combined descriptors, alongside $P_i^n$, are then passed to the Decoder, which mirrors a reversed DiffusionNet encoder structure. The Decoder predicts the deformation of the 3D neutral face, which is then combined with the original 3D neutral face $m_i^n$ to generate the animated sequence.

Citation

@inproceedings{nocentini2024scantalk3dtalkingheads,
    title = {ScanTalk: 3D Talking Heads from Unregistered Scans},
    author = {Nocentini, F. and Besnier, T. and Ferrari, C. and Arguillere, S. and Berretti, S. and Daoudi, M.},
    booktitle = {Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV)},
    year = {2024},
  }

ScanTalk Installation Guide

This guide provides step-by-step instructions on how to set up the ScanTalk environment and install all necessary dependencies. The codebase has been tested on **Ubuntu 20.04.2 LTS** with **Python 3.8**. ## 1. Setting Up Conda Environment It is recommended to use a Conda environment for this setup. 1. **Create a Conda Environment** ```bash conda create -n scantalk python=3.8.18 ``` 2. **Activate the Environment** ```bash conda activate scantalk ``` ## 2. Install Mesh Processing Libraries 1. **Clone the MPI-IS Repository** ```bash git clone https://github.com/MPI-IS/mesh.git ``` ```bash cd mesh ``` 2. **Modify line 7 of the Makefile to avoid error** ``` @pip install --no-deps --config-settings="--boost-location=$$BOOST_INCLUDE_DIRS" --verbose --no-cache-dir . ``` 3. **Run the MakeFile** ```bash make all ``` ## 2. Installing PyTorch and Requirements Ensure you have the correct version of PyTorch and torchvision. If you need a different CUDA version, please refer to the [official PyTorch website](https://pytorch.org/). 1. **Install PyTorch, torchvision, and torchaudio** ```bash conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia ``` 2. **Install Requirements** ```bash pip install -r requirements.txt ``` ---

Dataset Installation Guide

For training and testing ScanTalk, we utilized three open-source datasets for 3D Talking Heads: [**vocaset**](https://voca.is.tue.mpg.de/), [**BIWI**](https://paperswithcode.com/dataset/biwi-3d-audiovisual-corpus-of-affective), and [**Multiface**](https://github.com/facebookresearch/multiface). The elaborated and aligned datasets, all standardized to the vocaset format, used for both training and testing ScanTalk, can be found [**here**](https://drive.google.com/drive/folders/1KetNagXa9jcgYwnDUAJxDx5UJMx9yLL2?usp=sharing). After downloading, place the `Dataset` folder in the main directory.

Pretrained Models Installation

We are releasing two versions of ScanTalk: one named `scantalk_mse.pth.tar`, trained using Mean Square Error Loss, and another named `scantalk_mse_masked_velocity.pth.tar`, which is trained with a combination of multiple loss functions. Both models are available for download [**here**](https://drive.google.com/drive/folders/1iH4ugUI_JoGiejZj3ENltxSIpUnFY4zl?usp=sharing). After downloading, place the `results` folder within the `src` directory.

ScanTalk Training, Testing and Demo

The files `scantalk_train.py` and `scantalk_test.py` are used for training and testing, respectively. `scantalk_test.py` generates a directory containing all the ScanTalk predictions for each test set in the datasets. After obtaining the predictions, `compute_metrics.py` is used to calculate evaluation metrics by comparing the ground truth with the model's predictions. You can use `demo.py` to run a demo of ScanTalk, animating any 3D face that has been aligned with the training set.

Authors

* Equal contribution.

Acknowledgements

This work is supported by the ANR project Human4D (ANR-19-CE23-0020) and by the IRP CNRS project GeoGen3DHuman. It was also partially supported by "Partenariato FAIR (Future Artificial Intelligence Research) - PE00000013, CUP J33C22002830006", funded by NextGenerationEU through the Italian MUR within the NRRP, project DL-MIG. Additionally, this work was partially funded by the ministerial decree n.352 of the 9th April 2022, NextGenerationEU through the Italian MUR within NRRP, and partially supported by Fédération de Recherche Mathématique des Hauts-de-France (FMHF, FR2037 du CNRS).

LICENSE

Creative Commons License
All material is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.