v-iashin SpecVQGAN issues

v-iashin / SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

https://v-iashin.github.io/SpecVQGAN

MIT License

347 stars 40 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

evaluate issue

#46 JokerHYX-719 closed 2 months ago
6
Colab demo fails at importing video

#45 DCooper-nz closed 4 months ago
1
Overfitting occurs when training transformer

#44 Ivvvvvvvvvvy closed 8 months ago
2
TypeError: __init__() got an unexpected keyword argument 'checkpoint_callback'

#43 a897456 opened 8 months ago
0
AttributeError: module 'signal' has no attribute 'SIGUSR1'

#42 a897456 closed 8 months ago
1
How to replace the new dataset

#41 a897456 opened 8 months ago
7
How is the diffusion model represented in this chapter?

#40 a897456 closed 8 months ago
5
The bitrate should be scaled by 1000, not 1024

#39 v-iashin opened 8 months ago
0
Spectrogram VQGAN as a Neural Audio Codec

#38 a897456 closed 8 months ago
27
Query: Training Result Interpretation, and Performance Indicators

#37 SmashSoar closed 11 months ago
2
Seeking Advice on MelGAN Model Training

#36 Ivvvvvvvvvvy closed 11 months ago
2
Evaluation Results

#35 aselimc opened 1 year ago
0
*** TypeError: __init__() got an unexpected keyword argument 'rgb_feats_dir_path'

#34 YingqingHe closed 1 year ago
4
Dev

#33 artificalaudio closed 1 year ago
1
Evaluation questions

#32 Ivvvvvvvvvvy closed 1 year ago
2
Number of different features

#31 Ivvvvvvvvvvy closed 1 year ago
2
VQModel1d

#29 yangdongchao closed 1 year ago
1
sample error: KeyError: 'test'

#28 Ivvvvvvvvvvy closed 1 year ago
4
netG.pt, optG.pt, netD.pt, optD.pt files of the MelGAN model

#27 Ivvvvvvvvvvy closed 1 year ago
3
Feed the 'mel.npy file' into the melgan vocoder

#26 Ivvvvvvvvvvy closed 1 year ago
2
Issues with training transformer on the VAS dataset

#25 mayqinxu closed 1 year ago
3
Question about generating audio (longer than 10s)

#23 albertwy closed 1 year ago
4
new environment.yml if it is possible?

#22 Allencheng97 opened 2 years ago
1
Issues with the sampling script

#21 roudimit opened 2 years ago
3
Loss becoming "nan" during codebook training?

#20 jhyau closed 2 years ago
2
Training conditional transformer

#16 radiradev closed 2 years ago
1
about training vocoder

#15 yangdongchao closed 2 years ago
3
Obout train Mel-GAN

#14 yangdongchao opened 2 years ago
2
Issue with vggish checkpoint

#13 luc-leonard opened 2 years ago
9
Cannot evaluation

#12 yangdongchao closed 2 years ago
2
Reconstruct mel spectrogram from librosa

#11 clairerity closed 2 years ago
3
report error when I use multiple GPUs

#10 yangdongchao opened 2 years ago
4
Is the generated sound visually aligned?

#7 sukun1045 closed 3 years ago
2
Issue running example with load_model()

#5 luisarandas closed 3 years ago
6
bending the re/de-constructed melspectrogram to create new sounds.

#4 johndpope closed 3 years ago
3
cpu inference colab

#3 AK391 closed 3 years ago
4
Huggingface spaces demo

#2 AK391 closed 3 years ago
0

v-iashin / SpecVQGAN

issues

evaluate issue

Colab demo fails at importing video

Overfitting occurs when training transformer

TypeError: init() got an unexpected keyword argument 'checkpoint_callback'

AttributeError: module 'signal' has no attribute 'SIGUSR1'

How to replace the new dataset

How is the diffusion model represented in this chapter?

The bitrate should be scaled by 1000, not 1024

Spectrogram VQGAN as a Neural Audio Codec

Query: Training Result Interpretation, and Performance Indicators

Seeking Advice on MelGAN Model Training

Evaluation Results

*** TypeError: init() got an unexpected keyword argument 'rgb_feats_dir_path'

Dev

Evaluation questions

Number of different features

VQModel1d

sample error: KeyError: 'test'

netG.pt, optG.pt, netD.pt, optD.pt files of the MelGAN model

Feed the 'mel.npy file' into the melgan vocoder

Issues with training transformer on the VAS dataset

Question about generating audio (longer than 10s)

new environment.yml if it is possible?

Issues with the sampling script

Loss becoming "nan" during codebook training?

Training conditional transformer

about training vocoder

Obout train Mel-GAN

Issue with vggish checkpoint

Cannot evaluation

Reconstruct mel spectrogram from librosa

report error when I use multiple GPUs

Is the generated sound visually aligned?

Issue running example with load_model()

bending the re/de-constructed melspectrogram to create new sounds.

cpu inference colab

Huggingface spaces demo