zyhbili / LivelySpeaker

[ICCV-2023] The official repo for the paper "LivelySpeaker: Towards Semantic-aware Co-Speech Gesture Generation".
77 stars 9 forks source link

Test data processing #18

Closed fcchit closed 6 months ago

fcchit commented 6 months ago

Good job!

I am currently training LivelySpeaker on BEAT, but when I run the test_LivelySpeaker_beat.py, I find that the test data is not split into 34 frames, so the number of text tags is too large, exceeding the limit of 77 in the clip model, resulting in an error.

Traceback (most recent call last): File "/a2g/LivelySpeaker/scripts_beat/test_LivelySpeaker_beat.py", line 254, in t = trainer.infer_from_testloader(test_loader = test_loader, motionclip_model= motionclip_model, mdm_model=mdm_model, sample_fn=sample_fn,guidance_param=1, skipsteps = skipsteps) File "/a2g/LivelySpeaker/scripts_beat/test_LivelySpeaker_beat.py", line 117, in infer_from_testloader texts = myclip.tokenize(batch['clip_text']).cuda() File "/root/miniconda3/lib/python3.9/site-packages/clip/clip.py", line 141, in tokenize result[i, :len(tokens)] = torch.tensor(tokens) RuntimeError: The expanded size of the tensor (77) must match the existing size (180) at non-singleton dimension 0. Target sizes: [77]. Tensor sizes: [180]

My data processing process is as follows,

  1. run /data_libs/preprocess_0.py
  2. run /data_libs/preprocess_1.py
  3. run test_LivelySpeaker_beat.py, and generate the test cache without aux_info
  4. run /data_libs/process_cache.py to generate the aux_info,

then I run test_LivelySpeaker_beat.py again, but the upper error occurs.

Do you have any idea about this? Looking forward to you quick reply, thank you!

zyhbili commented 6 months ago

Sorry for the inconvenience. I process the data and divide them into 34 frames with some hardcode since it runs only once. As metioned in data_libs/README.md, i build a new cache named final_test for metric inferring. So you need to run the self.cache_generation func with is_test = False on the test dataset to divide them into 34 frames.

fcchit commented 6 months ago

Thanks for your quick reply! I use is_test = False to generate the test data cache and now everything works fine. Although is_test = False is a little unexpected, it solves the problem :)

fcchit commented 6 months ago

Hello @zyhbili, I retrained LivelySpeaker, but the results of the evaluation metrics were not good. So I reprocessed the training data, but I found that the calculation of the mean and standard deviation of bvh is missing.

zyhbili commented 6 months ago

We use rot6d representation without normalization(i.e (x-mu)/std). So there is no need for the mean and standard deviation when processing the data, you can simply set them to mu=0 and std 1.

fcchit commented 6 months ago

I see. Thanks for your quick replies.