Some question about model trained on 768 size TED dataset

Zenobia7 commented 2 years ago

First of all, thank you very much for providing the code, but I have encountered some small problems in the process of retraining, so I would like to ask you how to deal with it. Questions to consult are as follows: 1、why reconstruction mode and train model with almost same L1 loss value? 2、Using the 768 size TED dataset, it is normal that some parts with more detailed information, such as hands and faces, are not recovered too well. If the current situation occurs, can you help to provide some solutions?

When the motion trend is obvious, the optical flow map is not very accurate.
Are there any precautions that need to take in preparing new dataset? The above are all my questions at present. Looking forward to your reply

AliaksandrSiarohin commented 2 years ago

Hi, sorry but your questions is really confusing:

I don't get the question. There is no L1 in train mode.
What is 768?
Could you provide example?
Depends on what objects will be in the new dataset.

Zenobia7 commented 2 years ago

I used the reconstruction results of train mode to calculate the L1 loss and the reconstruction results of avd mode are almost the same, so I think avd mode is not effective
I cut TED dataset with 768*768 size
The new dataset is based on half-speaker video objects. Some videos of the new dataset are below,The new data sets are highly heterogeneous and diverse https://user-images.githubusercontent.com/28126038/182800076-b9e4dea5-d927-41cd-ab7d-038e2cfccbf3.mp4 https://user-images.githubusercontent.com/28126038/182800140-632904d1-27e7-4a4a-9ec2-142fc59e01b5.mp4 https://user-images.githubusercontent.com/28126038/182800340-c7f54217-72a0-4a01-99d4-6cd7c4ec64e9.mp4

3.train mode visualization Results

https://user-images.githubusercontent.com/28126038/182807847-c6de664c-3903-49af-af32-fadd4b218d2a.mp4

avd mode visualization Results

https://user-images.githubusercontent.com/28126038/182807198-893d9624-1a8a-4cc1-9dcd-015a4953ce71.mp4

train log visualization train_log

Is it convenient for you to provide the training log? I want to compare it with my log. Thank you. Is there anything unclear

AliaksandrSiarohin commented 2 years ago

Reconstruction does not make sense for avd, since it specifically designed for cross identity, where the shapes of the objects could be different.
There are no explicit handling of parts that is not visible most of the time, I guess you will have to device some way of handling that.
I can't see what bothers you in optical flow map.
Unfortunately I don't have logs anymore.

Zenobia7 commented 2 years ago

Reconstruction does not make sense for avd, since it specifically designed for cross identity, where the shapes of the objects could be different.

There are no explicit handling of parts that is not visible most of the time, I guess you will have to device some way of handling that.

I can't see what bothers you in optical flow map.

Unfortunately I don't have logs anymore.

Thank you for your prompt reply.

Since there is no problem with the optical flow diagram, does it mean that there will be a problem that the details of the reconstruction are not clear? Is the reason that the reconstruction details are not clear is that the generator is not strong enough or the information of the optical flow diagram is not fully utilized?
Do you think it is OK for me to use half-speaker videos with complex background and inconsistent height in my self-built data set? It seems to me that Loss is decreasing rapidly at present, and then it will not decrease

微信图片_20220805103913

https://user-images.githubusercontent.com/28126038/182990778-1c3cba4a-ae5c-4806-b23c-23f9ef3539f0.mp4

https://user-images.githubusercontent.com/28126038/182990785-43862275-00db-4a46-a569-6dc1489180b4.mp4 Uploading 20200507094714_11_aC9no_1080p#008375#008417.mp4.mp4…

https://user-images.githubusercontent.com/28126038/182991181-e7393b8c-ae69-4fbb-8efe-21c23602d193.mp4

https://user-images.githubusercontent.com/28126038/182991182-a4ec1156-b46d-4156-91b0-ad0898b9ab0a.mp4

https://user-images.githubusercontent.com/28126038/182991183-93c154d0-0d6c-4d57-ac81-74731c24226e.mp4

laodar commented 1 month ago

@Zenobia7 Hi, do you have a paper or benchmark about your new dataset? Is the new dataset public now? How did you get it? Thanks a lot.

snap-research / articulated-animation

Some question about model trained on 768 size TED dataset #54