kleinlee / DH_live

每个人都能用的数字人
618 stars 135 forks source link

dataset to train the model #1

Open monkeyCv opened 3 months ago

monkeyCv commented 3 months ago

Dear author, thanks for your greate work. Could you share some information about training dataset? Such as dataset size, dataset person numbers?How to collect Chinese dataset? Thanks very much.

Ikaros-521 commented 3 months ago

同求 模型训练说明

kleinlee commented 3 months ago

Simply put, it is forcibly splitting the neural rendering algorithm into two parts. Simplify the audio_decoder and proceed with a simplified DiNet model. Audio_decoder is distilled from a more powerful model and outputs detailed changes in mouth shape. Add 3D rotation and manually construct a single-layer face UV map. DiNet acts as a decoder to generate the final face.

For DiNet, you can refer to the training and dataset construction steps in the code to complete your own training. For audio encoder, you need to find a satisfactory digital human model or commercial product that produces sufficient video as distilled material. I have verified that using the generated stable data, a lightweight LSTM algorithm is enough to achieve good performance. Note: Do not only use mouth keypoints as the output of LSTM, better feature representation must be used, for example, I used PCA algorithm to extract pixel-level changes of the mouth.

zhangziliang04 commented 2 months ago

简单地说,就是强行将神经渲染算法分成两部分。简化audio_decoder并继续使用简化的 DiNet 模型。Audio_decoder是从更强大的模型中提炼出来的,并输出嘴形的详细变化。添加 3D 旋转并手动构建单层人脸 UV 贴图。DiNet 充当解码器来生成最终的人脸。

对于 DiNet,您可以参考代码中的训练和数据集构建步骤来完成自己的训练。对于音频编码器,您需要找到一个令人满意的数字人体模型或商业产品,以产生足够的视频作为蒸馏材料。我已经验证过,使用生成的稳定数据,轻量级的 LSTM 算法足以实现良好的性能。注意:不要只使用嘴巴关键点作为LSTM的输出,必须使用更好的特征表示,例如,我使用PCA算法来提取嘴巴的像素级变化。

能否给出模型训练的说明文档,非常感谢。

qiuzi commented 1 month ago

4个视频训练 loss_GI 下不起 一直在.3-.23徘徊

xiaomaofei commented 1 month ago

训练模型数据什么都没说明呀。同求

kleinlee commented 1 month ago

训练部分已更新,参见https://github.com/kleinlee/DH_live/tree/master/train