Closed leohku closed 5 months ago
š¤ My log shows that after initialization, the Train_vertice_recon
loss is around 1e-05. I think there should be something wrong with the dataset you used.
2023-08-08 05:22:20,923 Epoch 0: Train_vertice_recon 2.833e-05 Train_vertice_reconv 8.574e-07 Memory 8.3%
Can you check if this still exists on biwi dataset?
š¤ My log shows that after initialization, the
Train_vertice_recon
loss is around 1e-05. I think there should be something wrong with the dataset you used.2023-08-08 05:22:20,923 Epoch 0: Train_vertice_recon 2.833e-05 Train_vertice_reconv 8.574e-07 Memory 8.3%
Can you check if this still exists on biwi dataset?
hi, I meet the same problem, and when i debug, I find the self.motion_decode
r have been initilized with all zeros, and it seems not been update in training follow your instruction, which results in the variable vertice_out=0
all the time.
I thought there are some bug in the code, something like variable need training not in optimizer. Could you train it from scractch using master branch code, or give some tips, appreciate your kindness.
Here is some of my training results.
@hqm0810 I've narrowed downĀ theĀ problem to an issue in the loss update function. Specifically, in allsplit_step
,
# training
if split == "train":
if self.guidance_uncondp > 0: # we randomly mask the audio feature
audio_mask = torch.rand(batch['audio'].shape[0]) < self.guidance_uncondp
batch['audio'][audio_mask] = 0
rs_set = self._diffusion_forward(batch, batch_idx, phase="train")
loss = self.losses[split].update(rs_set)
return loss
While the rs_set
looksĀ normal,Ā theĀ loss
Ā returned to the Trainer is None
, which prevented backprop. This means the loss is indeed being calculated and logged by Torchmetric, but there is a difference in between the Tensor returned by the loss update()
function and the value received by allsplit_step()
.
Another issue I've identified is that the loss tensor seems to have requires_grad=False
.
Can you see if this is the cause for your problem?
@hqm0810 I've narrowed downĀ theĀ problem to an issue in the loss update function. Specifically, in
allsplit_step
,# training if split == "train": if self.guidance_uncondp > 0: # we randomly mask the audio feature audio_mask = torch.rand(batch['audio'].shape[0]) < self.guidance_uncondp batch['audio'][audio_mask] = 0 rs_set = self._diffusion_forward(batch, batch_idx, phase="train") loss = self.losses[split].update(rs_set) return loss
While the
rs_set
looksĀ normal,Ā theĀloss
Ā returned to the Trainer isNone
, which prevented backprop. This means the loss is indeed being calculated and logged by Torchmetric, but there is a difference in between the Tensor returned by the lossupdate()
function and the value received byallsplit_step()
.Another issue I've identified is that the loss tensor seems to have
requires_grad=False
.Can you see if this is the cause for your problem?
Thank you, the cause of the problem is the VOCALosses
always return None
when call update
function (although the loss
in update
is normal), so I simply reimplement the VOCALosses
in DIFFUSION_BIAS
, everything is normal, Thank you very much for your answer again.
It seems like for some reason computing the loss term in the Metrics
class will cause the output tensors to lose its gradients, as well as incorrectly compute smaller losses. Using @hqm0810's answer as inspiration, I was able to construct a minimal working example:
For file alm/models/modeltype/diffusion_bias.py
,
@@ -6,7 +6,7 @@ from transformers import Wav2Vec2Model
from alm.config import instantiate_from_config
from alm.models.modeltype.base import BaseModel
-from alm.models.losses.voca import VOCALosses
+from alm.models.losses.voca import VOCALosses, MaskedConsistency, MaskedVelocityConsistency
from alm.utils.demo_utils import animate
from .base import BaseModel
@@ -44,6 +42,8 @@ class DIFFUSION_BIAS(BaseModel):
key: self._losses["losses_" + key]
for key in ["train", "test", "val", ] # "train_val"
}
+ self.reconstruct = MaskedConsistency()
+ self.reconstruct_v = MaskedVelocityConsistency()
# set up model
self.audio_encoder = Wav2Vec2Model.from_pretrained(cfg.audio_encoder.model_name_or_path)
@@ -114,7 +114,12 @@ class DIFFUSION_BIAS(BaseModel):
batch['audio'][audio_mask] = 0
rs_set = self._diffusion_forward(batch, batch_idx, phase="train")
- loss = self.losses[split].update(rs_set)
+
+ mask = rs_set['vertice_attention'].unsqueeze(-1)
+ loss1 = self.reconstruct(rs_set['vertice'], rs_set['vertice_pred'], mask)
+ loss2 = self.reconstruct_v(rs_set['vertice'], rs_set['vertice_pred'], mask)
+ loss = loss1 + loss2
+ self.losses[split].update(loss1, loss2, loss)
return loss
For file alm/models/losses/voca.py
,
@@ -118,31 +118,13 @@ class VOCALosses(Metric):
# lip_vertice = vertice.view(shape[0], shape[1], -1, 3)[:, :, mouth_map, :].view(shape[0], shape[1], -1)
# return lip_vertice
- def update(self, rs_set):
- # rs_set.keys() = dict_keys(['latent', 'latent_pred', 'vertice', 'vertice_recon', 'vertice_pred', 'vertice_attention'])
-
- total: float = 0.0
- # Compute the losses
- # Compute instance loss
-
- # padding mask
- mask = rs_set['vertice_attention'].unsqueeze(-1)
-
+ def update(self, recon, recon_v, ttl):
if self.split in ['losses_train', 'losses_val']:
- # vertice loss
- total += self._update_loss("vertice_enc", rs_set['vertice'], rs_set['vertice_pred'], mask = mask)
- total += self._update_loss("vertice_encv", rs_set['vertice'], rs_set['vertice_pred'], mask = mask)
-
- # lip loss
- # lip_vertice = self.vert2lip(rs_set['vertice'])
- # lip_vertice_pred = self.vert2lip(rs_set['vertice_pred'])
- # total += self._update_loss("lip_enc", lip_vertice, lip_vertice_pred, mask = mask)
- # total += self._update_loss("lip_encv", lip_vertice, lip_vertice_pred, mask = mask)
-
- self.total += total.detach()
+ self.vertice_enc += recon.detach()
+ self.vertice_encv += recon_v.detach()
+ self.total += ttl.detach()
self.count += 1
-
- return total
+ return ttl
if self.split in ['losses_test']:
raise ValueError(f"split {self.split} not supported")
This allows the model to train with the correct losses (Train_vertice_recon
loss being around 1e-05 in the first epoch). But further modifications are required to implement it for the Validation stage.
Hi,all. Is the bug fixed in master branch?
It seems like for some reason computing the loss term in the
Metrics
class will cause the output tensors to lose its gradients, as well as incorrectly compute smaller losses. Using @hqm0810's answer as inspiration, I was able to construct a minimal working example:For file
alm/models/modeltype/diffusion_bias.py
,@@ -6,7 +6,7 @@ from transformers import Wav2Vec2Model from alm.config import instantiate_from_config from alm.models.modeltype.base import BaseModel -from alm.models.losses.voca import VOCALosses +from alm.models.losses.voca import VOCALosses, MaskedConsistency, MaskedVelocityConsistency from alm.utils.demo_utils import animate from .base import BaseModel @@ -44,6 +42,8 @@ class DIFFUSION_BIAS(BaseModel): key: self._losses["losses_" + key] for key in ["train", "test", "val", ] # "train_val" } + self.reconstruct = MaskedConsistency() + self.reconstruct_v = MaskedVelocityConsistency() # set up model self.audio_encoder = Wav2Vec2Model.from_pretrained(cfg.audio_encoder.model_name_or_path) @@ -114,7 +114,12 @@ class DIFFUSION_BIAS(BaseModel): batch['audio'][audio_mask] = 0 rs_set = self._diffusion_forward(batch, batch_idx, phase="train") - loss = self.losses[split].update(rs_set) + + mask = rs_set['vertice_attention'].unsqueeze(-1) + loss1 = self.reconstruct(rs_set['vertice'], rs_set['vertice_pred'], mask) + loss2 = self.reconstruct_v(rs_set['vertice'], rs_set['vertice_pred'], mask) + loss = loss1 + loss2 + self.losses[split].update(loss1, loss2, loss) return loss
For file
alm/models/losses/voca.py
,@@ -118,31 +118,13 @@ class VOCALosses(Metric): # lip_vertice = vertice.view(shape[0], shape[1], -1, 3)[:, :, mouth_map, :].view(shape[0], shape[1], -1) # return lip_vertice - def update(self, rs_set): - # rs_set.keys() = dict_keys(['latent', 'latent_pred', 'vertice', 'vertice_recon', 'vertice_pred', 'vertice_attention']) - - total: float = 0.0 - # Compute the losses - # Compute instance loss - - # padding mask - mask = rs_set['vertice_attention'].unsqueeze(-1) - + def update(self, recon, recon_v, ttl): if self.split in ['losses_train', 'losses_val']: - # vertice loss - total += self._update_loss("vertice_enc", rs_set['vertice'], rs_set['vertice_pred'], mask = mask) - total += self._update_loss("vertice_encv", rs_set['vertice'], rs_set['vertice_pred'], mask = mask) - - # lip loss - # lip_vertice = self.vert2lip(rs_set['vertice']) - # lip_vertice_pred = self.vert2lip(rs_set['vertice_pred']) - # total += self._update_loss("lip_enc", lip_vertice, lip_vertice_pred, mask = mask) - # total += self._update_loss("lip_encv", lip_vertice, lip_vertice_pred, mask = mask) - - self.total += total.detach() + self.vertice_enc += recon.detach() + self.vertice_encv += recon_v.detach() + self.total += ttl.detach() self.count += 1 - - return total + return ttl if self.split in ['losses_test']: raise ValueError(f"split {self.split} not supported")
This allows the model to train with the correct losses (
Train_vertice_recon
loss being around 1e-05 in the first epoch). But further modifications are required to implement it for the Validation stage.
Thanks for your code! Have you trained the dataset and verified the result?
Dear all, thanks for your effort. We have released the missing part for training. You can now train the model with decreasing losses.
Dear all, thanks for your effort. We have released the missing part for training. You can now train the model with decreasing losses.
Hi, thanks for your reply. But I didn't find your latest commit. How should I use the latest code?
Hi, thanks for your reply. But I didn't find your latest commit. How should I use the latest code?
Hi @aixiaodewugege, do you get any results? I rewrited the loss calculation according to the above and trained on the vocaset for around 3500 epoches, when I tested, the results is still not good, the mouth even not opened.
Hi. I can train it on vocaset and get good result after 9000 epoch.
Hi, thanks for your reply. But I didn't find your latest commit. How should I use the latest code?
Hi @aixiaodewugege, do you get any results? I rewrited the loss calculation according to the above and trained on the vocaset for around 3500 epoches, when I tested, the results is still not good, the mouth even not opened.
Hiļ¼Have you trained it on muti GPU? I only be able to train it on single GPU.
Hi, thanks for your reply. But I didn't find your latest commit. How should I use the latest code?
Hi @aixiaodewugege, do you get any results? I rewrited the loss calculation according to the above and trained on the vocaset for around 3500 epoches, when I tested, the results is still not good, the mouth even not opened.
Hiļ¼Have you trained it on muti GPU? I only be able to train it on single GPU.
I havn't try it on muti GPU yet. But the single GPU is OK for me too.
Hi, thanks for your reply. But I didn't find your latest commit. How should I use the latest code?
Hi @aixiaodewugege, do you get any results? I rewrited the loss calculation according to the above and trained on the vocaset for around 3500 epoches, when I tested, the results is still not good, the mouth even not opened.
Hiļ¼Have you trained it on muti GPU? I only be able to train it on single GPU.
I havn't try it on muti GPU yet. But the single GPU is OK for me too.
Thanks. If you could fix the muti GPU problem, please teach me with it~~~
Hi. I can train it on vocaset and get good result after 9000 epoch.
Hi, I tried to modify the loss function a bit, but I still can't train the expected results, can I ask how you made the modification? Thank you!
in torchmetrics 0.11.4ļ¼ the update() function of Metric
always return None because of this _wrap_update() functionļ¼so youāre not supposed to return anything in your rewrite update() function.
Symptoms
Training VOCASET with the supplied wav2vec2 script produces a static output of the template for any audio input. Here's an example: Here are the Tensorboard error graphs:
Notably, the loss seems to be weirdly small for both components.
Steps to reproduce
dataset/vocaset
folder, I copied over thevertices_npy
andwav
folders that's also used for FaceFormer trainingscripts/diffusion/vocaset_training/diffspeaker_wav2vec2_vocaset.sh
.Troubleshooting steps tried
templates.pkl
and the self-supplied.npy
files are the same.npy
files are 60FPS, but I left the[::2,:]
in the load_data function untouchedprint(len(self.data_splits['train']))
inalm/data/vocaset.py
, I can see that 314 training samples have been loadedscipy
(mine is 1.12.0 vs 1.9.1 inrequirements.txt
), all pip packages have the same version as the suppliedrequirements.txt
Logs
Here's the truncated logs of the training