simonalexanderson / StyleGestures

Other
475 stars 65 forks source link

Questions about the sampled data. #17

Open Meteor-Stars opened 3 years ago

Meteor-Stars commented 3 years ago

Hello, I hope I'm not disturbing you. My questions are as follows.

First, I notice that when the global steps reach the number of 'plot_gap', the model will sample some '.bvh' files. How can I find the corresponding audio files? I supposed that the corresponding audio files should be in the 'visualization_dev' or 'visualization_test' folder, but the quantity of the audio files in these folders is both different with the quantity of output '.bvh' files. How can I find the corresponding audio clips of the output 'bvh' files?

Seconde, I find that, in 'trinity.py', the 'test_input_20fps.npz' and the 'test_output_20fps.npz' files which were processed in 'prepare_datasets.py', were not used. It seems that I haven't found the files were used in other places. The files are must be useful in the overlooked place. Can you give me some guidances to help me resolve this confusion?

I would be grateful if you could give me some help. I am looking forward to hearing from you!

Best wishes for you!

ghenter commented 3 years ago

I think only @simonalexanderson knows how to answer these questions, and I hope he can find the time to help.

Meteor-Stars commented 3 years ago

Thank you. And I might add that, just like the synthesized gesture of Obama you provided. The Obama audio file was divided into many clips, and the model sampled many output '.bvh' files. You must first find the corresponding audio clip of the output 'bvh' file and then the audio clip would be synchronized with the gesture. Finally the synthesized gesture video of Obama, which had the audio clip and the corresponding gesture motion at the same time, was present. And now I am not sure how to find the corresponding audio clips of the output 'bvh' files and looking forward to some advice and guidances.

simonalexanderson commented 3 years ago

Hi @Meteor-Stars, I have now restructured the code and added a script called 'prepare_gesture_testdata.py' to facilitate sythesis from arbitrary wav-sources. The process is: 1) resample wav files to 48k, and place them in the data/GENEA/source/test_audio folder 2) run 'python prepare_gesture_testdata.py'. 3) modify hparams/.json to point at the data/GENEA/processed/test file and add the pretrained model. 4) run python train_moglow.py hparams/.json trinity.

Hope this helps.