Open makpia opened 4 years ago
@makpia Sorry, I have small question, how do you get these output image? I search for my folder and didn't see the output image , but I really need the picture for presentation
@makpia Sorry, I have small question, how do you get these output image? I search for my folder and didn't see the output image , but I really need the picture for presentation
I generate the output image by using torchvision.utils.save_image(). The writer.add_image() in the original code didn't generate any images either in folder or in tensorboard in my computer, too. When you successfully generate output images, I would like to know if you reach the same problem as mine.
There is no problem in the loss weight setting in the original paper. The reconstruction loss weight is 10x than pixel/feature consistent loss weight, and 100x than L1 loss weight.
In the public code, the loss weight of each loss item is 10x, which can speed up the convergence.
Please check you data pcocessing step carefully. The training images should be detected & aligned & cropped. The training images you show here are quite strange and have not been processed correctly.
There is no problem in the loss weight setting in the original paper. The reconstruction loss weight is 10x than pixel/feature consistent loss weight, and 100x than L1 loss weight.
In the public code, the loss weight of each loss item is 10x, which can speed up the convergence.
Please check you data pcocessing step carefully. The training images should be detected & aligned & cropped. The training images you show here are quite strange and have not been processed correctly.
Thanks for your reply. I cropped the voxceleb2 images according to the txt in voxceleb website and the instruction in 'https://github.com/cyrta/voxceleb/blob/master/data/v1/voxceleb1_readme.txt'. I didn't use other detector to locate the face. I cropped the face regions according to the txt directly. Here are some examples of my training images.
Vox2's cropped region is larger than vox1, but the code will centercrop it so I think the region is accurate. Can you decribe more details about the "strange" images? I think my data are cropped correctly. About the loss weight: Now I have changed the learning rate from 0.001 to 0.0001, and delete the 'w7 out_img_pose_loss_sort + w8 out_img_exp_loss_sort' these two losses which are not mentioned in you parer. Now the early result(600 epoch) shows that the model starts to learn au and pose though it is still twisted, but I am not sure if the final result is good as yours.
The training images should be aligned & cropped according to facial landmarks. The training images you show here are not correctly processed.
There is no need to adjust the loss weight currently.
The training images should be aligned & cropped according to facial landmarks. The training images you show here are not correctly processed.
There is no need to adjust the loss weight currently.
Thanks for your reply. After checking cropping results and comparing your cropping code with FAb-Net's, I found that FAb-Net crop the upper middle region of the precropped image which will get a tightly cropped face image, while your code does not narrow the face region without a external face detector to crop the image. I guess the problem will be solved this time. Thanks for your advice.
Here is the result after changing the cropped region more "tight". The training paramaters are exactly according to the paper. The AU classifier trained based on this result, and trained on BP4D, get the f1 result average in 56.2%. However, though the average is closed to the result in paper, the f1 scores of each AU are 0.582、0.571、0.566、0.562、0.560、0.563、0.575、0.570、0.560、0.561、0.563、0.514, which is far away from all those previous work. So, I want to ask several questions:
The model will be public these days.
I was training with different dataset and my example pair looks like this.
Is this preprocessing okay? Using above pair like images I start training process and after some epochs when I checked out the results, it seems expression are still biased towards source image and not target image. I also found this type of behaviour on above image posted by makpia. Am I thinking it in right direction or I miss something. Please clarify. Thanks!
@makpia Have you downloaded the voxcele1/2 datasets? Can you share it? It's too big
@makpia Have you downloaded the voxcele1/2 datasets? Can you share it? It's too big
well, the dataset consists of more than 2*10^8 images(i only downloaded about 70%, it reaches the maximun file amount), so i cannot compress or even copy it due to the massive processing time. and i dont have enough space for it either. you can follow the method mentioned in https://github.com/cyrta/voxceleb/blob/9d0aa82e14a44465b3eaf818872cd74ef9edb42b/data/v1/voxceleb1_readme.txt to download the dataset. the size of the whole dataset is about 3~4tb i guess. it cause me about a month to download since I need to use proxy server, which is unstable, slow, and time-consuming to employ and debug. with unblocked network, it wont cost more than 2 weeks to finish downloading and clipping.
@makpia Thank you for your reply. Could you please leave your WeChat? I would like to ask you some questions
@makpia Thank you for your reply. Could you please leave your WeChat? I would like to ask you some questions
I've downloaded the whole voxceleb2 and conbined the part a to i as one Zip file about 280G. I tried to release it but when the process reached 50%, the image files at least 1T. My SSD couldn't have any storage! So if you want to get the full dataset it may take up 2T storage space.
@makpia Thank you for your reply. Could you please leave your WeChat? I would like to ask you some questions
sure. but i didnt set a wechat code. you can add me using qq601936549
@makpia Thank you for your reply. Could you please leave your WeChat? I would like to ask you some questions
I've downloaded the whole voxceleb2 and conbined the part a to i as one Zip file about 280G. I tried to release it but when the process reached 50%, the image files at least 1T. My SSD couldn't have any storage! So if you want to get the full dataset it may take up 2T storage space.
it is really a difficult task to do so. a way to zip and pack this dataset is to pack the clip first. it means you need to replace the images in one folder as one file, which can save a lot of disk space but an extra extracting operation will be needed during the code is running.
I trained the model using your code with default settings, and changed nothing except the batch_size(64 to 32, due to my gpu's memory). The result after 1900 epochs is really weird. Here are photos of total error and an example of au-changed image.
I cannot tell where the problem is. Should I change the weights of the losses? I noticed that the weights in you code is different from those in your paper.