Closed akshay951228 closed 5 years ago
Hi @akshay951228,
Last commit was still not producing good results, I just pushed it to commit a major change in how the residual layers work in order to integrate AdaIN layers properly. I just made a new commit now with all the changes since then, and now the network seems to produce reasonable results.
As for your questions:
1a. The embedder has always had an activation layer at the end. If what you're asking is why the ReLu is after the pooling and not before, well, that's because it doesn't matter. ReLu will simply cap all negative values, whereas pooling will select the maximum value for each channel, which means that it makes no difference to first cap negatives or first select maximum values: the result will be the same.
1b. The generator had no activation layer in the previous commit because I realized that since I'm using ImageNet normalization on the frames, the generated images should end up with the same range of values, which is different for every channel. So I decided I would for now let the Generator learn it on its own. In this version, I am not performing ImageNet normalization when extracting the data, so the generated images should have values in the range [0, 1], and that allows me to do a sigmoid at the end of the Generator. I am still doing ImageNet normalization inside the loss function in order to be able to use the VGG model.
firstly , thanks for fast response
1a )my bad not at Line L228 is at https://github.com/grey-eye/talking-heads/blob/master/network/network.py#L243, after bmm there is not activation function in older commit there is tanh activation , but you are not using any activation
1b ) you can use norm [0,1] , scale it to 255 and do the imagenet norm before passing to vgg mode , have a look in below link , maybe it will help https://github.com/xthan/VITON/blob/e6b560225975ddd40d96359cc13f8d66b975aa20/utils.py#L488
Did batch size more 1 work for you?
1b ) you can use norm [0,1] , scale it to 255 and do the imagenet norm before passing to vgg mode , have a look in below link , maybe it will help
This is exactly what is happening right now. The ImageNet normalization is done inside the loss function, before passing to VGG. And I'm not doing any normalization of the images myself, but when converting them to a tensor PyTorch automatically scales them to [0, 1]
Did batch size more 1 work for you?
I haven't had the time to let it train enough to get really good results, but with a batch size of 2 I have managed to get silhouettes of faces with a similar color pattern from the source image after some 10 hours of training on the smaller test set.
after 50 epoch there are result I got which latest commit with small dataset
Did I still need to train or any suggestion from your side is very helpfull
Similar results with you @akshay951228 after 1000 epochs using 200 videos.
200 videos is probably way too little. The full dataset used in the paper has ~140000 videos. I am training with a subset of 120000 due to memory issues.
I still haven't managed to get real results either yet, though. So far the network produces things like this:
@MrCaracara at which epoch you got this above results?
That is after 2 epochs
I also tried using the whole dataset to train, and set different batch sizes to compare.
Batch_size = 1, after 3 epochs:
Batch_size = 32, after 5 epochs:
It seems that larger batch size leads to worse results.
In addition, I also want to show the results below without using MLP, after 1.5 epochs, the results seem better than using MLP.
@MrCaracara
yes @busning there is problem with the batch size after some debugger I found the result are wired because of the embedder network and there was no problem with generator and discriminator as far as I now
maybe we should focus on embedder @MrCaracara to solve this batch size issue
Thank you for pointing out the right direction, I will focus on debugging the embedder today. @akshay951228
If that's the result you get when using batches, then I guess the must be a problem must with the collapsing of the B and K dimensions of the training frames, like talked in the firsts posts of this thread. It could be that data is leaking from one batch to the other when passing through the Embedder.
As for the use of MLP, I added it to try it out, since someone claimed in a different thread that it helped their results, so I'm surprised to see that you get decent results in just 1.5 epoch without it. I guess I will restart my training with just a projection matrix.
After 3 epochs this is the result that I get with the code as it is right now:
@woshixylg: Indeed, that's what they call the feed forward version of the algorithm. The switch to turn L_mch on or off is already present in LossD, and these results are without L_mch.
@MrCaracara https://github.com/MrCaracara Similar results with you after 3 epochs using batch_size = 1. However, I notice that the losses are not decreasing when training more epochs. The paper mentioned that they started to train the network 150 epochs without using L_mch, and I tried it today and am waiting for the results. I guess the reason that the losses don't decrease further is that more weights are arranged for L_mch.
@MrCaracara In ResidualBlockDown you passed input to relu activation(https://github.com/grey-eye/talking-heads/blob/master/network/components.py#L96) instead of conv , Is there reason behind it? we have input norm to [0 -1] and apply relu to it . as which is equal to same output as input
@MrCaracara In ResidualBlockDown you passed input to relu activation(https://github.com/grey-eye/talking-heads/blob/master/network/components.py#L96) instead of conv , Is there reason behind it? we have input norm to [0 -1] and apply relu to it . as which is equal to same output as input
That is simply the implementation of the original Residual Block Down from BigGAN, as referenced in the paper (See Figure 15)
I assume the reason why they added a ReLU there is to make sure the data is always in the range [0, 1] throughout the entire network, as the final output will have to be in that range (anything outside would be an invalid RGB value).
Hi @MrCaracara , I try this repo and get converge very fast (about 4000iter) to the result like your result after 3 epoch. His code did not scale the data, just keep [0-255]. ( I modify his perceptual loss )
hi there is no activation function at end of discriminator https://github.com/grey-eye/talking-heads/blob/master/network/network.py#L243 and I was little bit confusion in W_i , and he relate this to e_hat in fine-tune, can you throw some light on it
hi there is no activation function at end of discriminator https://github.com/grey-eye/talking-heads/blob/master/network/network.py#L243 and I was little bit confusion in W_i , and he relate this to e_hat in fine-tune, can you throw some light on it
I tried adding a sigmoid and a tanh and the end of the of discriminator, but then the loss of D would stall, so that's why I ended up removing it. The output doesn't really mean anything besides that the higher the value the more likely an image is to be correct, since there are no labels, so that's why I also considered it unnecessary.
About W, when training for fine tuning, every column of that matrix is supposed to be similar to the e vector of the corresponding video. So far I have only trained with the feed forward model, which doesn't do this.
In addition, I also want to show the results below without using MLP, after 1.5 epochs, the results seem better than using MLP.
@busning, did you change anything else besides replacing the MLP back with a matrix to get the results in the last image?
tried with 3000 data after 1 epochs
after 12 epoch
I did some reordering in resblock and added sigmoid activation function at end of discriminator and I'm not using MLP ,just using projection matrix and its working with batch size
That's the best results I have seen so far! Great job! What kind of reordering did you do? If you make a pull request I'll accept it and try to train it with the entire dataset.
On Wed, Aug 7, 2019, 13:41 akshay kumar notifications@github.com wrote:
tried with 3000 data after 1 epochs [image: 2_499] https://user-images.githubusercontent.com/25878082/62616423-147e5000-b92d-11e9-8fd7-0e0e3d01c2b3.png
after 13 epoch [image: 13_599] https://user-images.githubusercontent.com/25878082/62616443-265ff300-b92d-11e9-8825-566f2b4d2642.png
I did some reordering in resblock and added sigmoid activation function at end of discriminator and I'm not using MLP ,just using projection matrix and its working with batch size
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grey-eye/talking-heads/issues/24?email_source=notifications&email_token=ABWUCG6XK4ATYE2BJ5HKU4TQDKRG7A5CNFSM4II2Z6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3X7PQA#issuecomment-519043008, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWUCG45LK4LQ2JMM5N73R3QDKRG7ANCNFSM4II2Z6CA .
Instead of pull request I describe what I did exactly two modification done 1) changed two res block ` class AdaptiveResidualBlock(nn.Module): def init(self, channels): super(AdaptiveResidualBlock, self).init()
self.in1 = AdaIn()
self.in2 = AdaIn()
self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1)
self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1)
def forward(self, x, mean1, std1, mean2, std2):
residual = x
out = F.relu(self.in1(self.conv1(x),mean1,std1))
out = self.in2(self.conv2(out),mean2,std2)
out = out + residual
return out
class ResidualBlock(nn.Module): def init(self, channels): super(ResidualBlock, self).init() self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1) self.in1 = nn.InstanceNorm2d(channels, affine=True) self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1) self.in2 = nn.InstanceNorm2d(channels, affine=True)
def forward(self, x):
residual = x
out = F.relu(self.in1(self.conv1(x)))
out = self.in2(self.conv2(out))
out = out + residual
return out
` 2)I just added sigmoid activation function after this line https://github.com/grey-eye/talking-heads/blob/master/network/network.py#L243 I think this activation function which help me converge faster
tried with 3000 data after 1 epochs
after 12 epoch
I did some reordering in resblock and added sigmoid activation function at end of discriminator and I'm not using MLP ,just using projection matrix and its working with batch size
Did you change the discriminator's loss? I don't think the sigmoid function witch output [0-1] fit the hinge loss witch from [-1, 1].
@hanxuanhuo I think the hinge loss is not [-1,1]
Using sigmoid with the output of D does seem to help the model converge much quicker! However, both losses stagnate very quickly too. Do yours also remain around these values?
Loss_E_G = 0.0985 Loss_D = 1.0001
Loss_E_G = 0.1116 Loss_D = 1.0018
Loss_E_G = 0.1088 Loss_D = 1.0092
Loss_E_G = 0.1202 Loss_D = 1.0012
Loss_E_G = 0.1073 Loss_D = 1.0000
Loss_E_G = 0.1028 Loss_D = 1.0007
Loss_E_G = 0.0949 Loss_D = 1.0016
Loss_E_G = 0.1048 Loss_D = 1.0010
Loss_E_G = 0.0941 Loss_D = 1.0014
Loss_E_G = 0.1087 Loss_D = 1.0004
Loss_E_G = 0.1067 Loss_D = 1.0000
Loss_E_G = 0.1116 Loss_D = 1.0003
Loss_E_G = 0.1265 Loss_D = 1.0021
Loss_E_G = 0.1203 Loss_D = 1.0003
Loss_E_G = 0.1136 Loss_D = 1.0001
Loss_E_G = 0.0956 Loss_D = 1.0003
Loss_E_G = 0.1091 Loss_D = 1.0001
Loss_E_G = 0.1127 Loss_D = 1.0009
Loss_E_G = 0.1095 Loss_D = 1.0031
yep @MrCaracara there are values right now for my training
epoch :16 ,step : 662 E_D:0.0879344791173935 D:1.000479817390442 epoch :16 ,step : 663 E_D:0.087563157081604 D:1.166745662689209 epoch :16 ,step : 664 E_D:0.08357848972082138 D:1.166815996170044 epoch :16 ,step : 665 E_D:0.026573803275823593 D:1.0364704132080078 epoch :16 ,step : 666 E_D:0.043518807739019394 D:1.0275287628173828 epoch :16 ,step : 667 E_D:0.06920713186264038 D:1.0043773651123047 epoch :16 ,step : 668 E_D:0.07651181519031525 D:1.0000147819519043 epoch :16 ,step : 669 E_D:0.08825021982192993 D:1.0003926753997803 epoch :16 ,step : 670 E_D:0.08158326894044876 D:1.1674630641937256 epoch :16 ,step : 671 E_D:0.08684965968132019 D:1.028291940689087 epoch :16 ,step : 672 E_D:0.07762998342514038 D:1.0001451969146729 epoch :16 ,step : 673 E_D:0.09002198278903961 D:1.0000580549240112 epoch :16 ,step : 674 E_D:0.09793408960103989 D:1.0000355243682861 epoch :16 ,step : 675 E_D:-0.19194158911705017 D:1.3097941875457764 epoch :16 ,step : 676 E_D:-0.25006210803985596 D:1.3337972164154053 epoch :16 ,step : 677 E_D:0.07950375974178314 D:1.00016450881958 epoch :16 ,step : 678 E_D:0.09060948342084885 D:1.0008139610290527 epoch :16 ,step : 679 E_D:0.09333132207393646 D:1.0000040531158447 epoch :16 ,step : 680 E_D:0.07605766505002975 D:1.0002230405807495 epoch :16 ,step : 681 E_D:0.07999680936336517 D:1.0001215934753418 epoch :16 ,step : 682 E_D:-0.0832897275686264 D:1.164804220199585 epoch :16 ,step : 683 E_D:0.07924594730138779 D:1.000558853149414 epoch :16 ,step : 684 E_D:-0.04413623735308647 D:1.1611897945404053 epoch :16 ,step : 685 E_D:0.056591056287288666 D:1.0390490293502808 epoch :16 ,step : 686 E_D:0.08398791402578354 D:1.0063879489898682 epoch :16 ,step : 687 E_D:0.08140194416046143 D:1.0005378723144531 epoch :16 ,step : 688 E_D:-0.09771829843521118 D:1.1667736768722534 epoch :16 ,step : 689 E_D:-0.15706413984298706 D:1.2116470336914062 epoch :16 ,step : 690 E_D:0.08421777933835983 D:1.0044937133789062 epoch :16 ,step : 691 E_D:0.07601674646139145 D:1.2363193035125732 epoch :16 ,step : 692 E_D:0.09327101707458496 D:1.0000172853469849 epoch :16 ,step : 693 E_D:0.08491037786006927 D:1.0000053644180298 epoch :16 ,step : 694 E_D:0.07348034530878067 D:1.013258457183838 epoch :16 ,step : 695 E_D:0.08878153562545776 D:1.002532720565796 epoch :16 ,step : 696 E_D:-0.2234032154083252 D:1.3043357133865356 epoch :16 ,step : 697 E_D:0.07888448238372803 D:1.1667258739471436 epoch :16 ,step : 698 E_D:-0.06074731796979904 D:1.1514358520507812 epoch :16 ,step : 699 E_D:0.0816541537642479 D:1.00107741355896 epoch :16 ,step : 700 E_D:0.08438102900981903 D:1.0000966787338257 epoch :16 ,step : 701 E_D:0.08242268860340118 D:1.0000488758087158 epoch :16 ,step : 702 E_D:0.08105947822332382 D:1.0000821352005005 epoch :16 ,step : 703 E_D:0.08781936764717102 D:1.0000382661819458 epoch :16 ,step : 704 E_D:0.08520457148551941 D:1.1619987487792969 epoch :16 ,step : 705 E_D:-0.030515065416693687 D:1.113174319267273
at start results
present results
@MrCaracara I didn't change anything except for not using MLP.
@akshay951228 I still cannot find the errors for multiple GPU training for the embedder using larger batch size. Can you show me some hints where I should focus on? I only checked the embedder and the projection layer.
@busning, I kept DataParallel to embedder an not for generator and discriminator with batch size 4 with four gpu mean generator and discirminator in single gpu with batch size 4 and embedder in 1 batch size in each gpu ,and this working in this but when embedder taking batch size more than one , not working. this how I came to know that there is issue with embedder but I made some changes to generator resblock and add activation function to discriminator which surprising solves batch size problem I'm able to run batch size 4 in 2 gpus
So in short: the meta-training process is solved! Here are the things that finally fixed it:
hi @MrCaracara , How are the training results?
These are some results in the middle of epoch 43, using almost the full dataset (exactly 140000 videos) and batch size of 3.
did you tried fine-tune with latest checkpoint? .If not please try ,so that we can compare with author result
@akshay951228 This is done using the feed forward model, which is not compatible with fine tuning. Before I can try fine tuning, I will have to fix LossMCH, but I haven't had the time to look into it. When I do, I can start training with it.
For now I'm just letting my server train on this model while I focus on other matters.
@MrCaracara what are your thoughts on this https://github.com/bj80heyue/One_Shot_Face_Reenactment new paper one shot face reenactment with pretrained models
It looks interesting, I will read it when I have the time to see how they approached the issue, but it does seem like their results are not as good as those of Samsung AI
@MrCaracara yeah the results arent as good and they are not releasing the landmark extract model but have released the pretrained models
Hi, After going to your code I got few questions: 1) In the embedder final layer there is no activation function https://github.com/grey-eye/talking-heads/blob/master/network/network.py#L228 and generator as well https://github.com/grey-eye/talking-heads/blob/master/network/network.py#L158. Can I know the reason behind it
2) you modified [B,K, 6, 256, 256] to [BxK, 6, 256, 256] in Embedder and finally your getting (B, E_VECTOR_LENGTH, 1), in this if the batch size is two then embedder try to find pose independent information of Person but here we are giving 2 person information with stacking up, maybe I think at the end embedder is suffering and not learning any information