jishengpeng WavTokenizer issues

jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

MIT License

800 stars 44 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Semantic Representation

#55 Uneasy-Z opened 1 week ago
2
worse performance of large model compared to small model?

#54 XiaoshanHsj opened 1 week ago
4
Training for wav to midi transcriper

#53 mito0o852 opened 1 week ago
1
Model can not converge

#52 VJJJJJJ1 closed 1 week ago
2
What is the expected behaviour with changing the bandwidth parameter

#51 tanmaylaud opened 2 weeks ago
0
Questions for Creating a Better Model

#50 ootsuka-repos opened 2 weeks ago
2
some questions about model

#49 VJJJJJJ1 closed 1 week ago
2
Installable Package

#48 poonehmousavi opened 2 weeks ago
2
Config/Model Checkpoint Pairing

#47 MorenoLaQuatra opened 1 month ago
5
Question about Audio Preprocessing

#46 xjf-303 opened 1 month ago
1
We update WavTokenizer paper in Arxiv and release WavTokenizer-Large checkpoint in Huggingface on 2024.10.22

#45 jishengpeng opened 1 month ago
4
Streaming infer

#44 wntg opened 1 month ago
3
How many training steps to train wavtokenizer?

#43 sphmel closed 1 month ago
1
When will the large unify model (speech, music, audio) be released?

#42 MrPig opened 1 month ago
1
probability density for each index in the codebook

#41 goforher opened 1 month ago
1
Performance in LLM-based-TTS

#40 Liujingxiu23 opened 2 months ago
2
Files Missing？

#39 goforher opened 2 months ago
6
why grad norm is so high？

#38 necrophagists opened 2 months ago
1
speech medium v2

#37 theodorblackbird closed 2 months ago
1
Why so large commit loss weight

#36 Ming-er opened 2 months ago
4
how to train the model with Token/s about 23, that is hopsize=1024

#35 Liujingxiu23 opened 2 months ago
5
CER Performance of Reconstructed Audio

#34 howitry opened 2 months ago
6
Using EMA on the generator markedly improves the validation loss

#33 erogol opened 2 months ago
2
Question about training

#32 handsomelys opened 2 months ago
2
Maximum duration supported during inference?

#31 LiuShixing opened 2 months ago
1
How many hours of Chinese data are there?

#30 LiuShixing closed 2 months ago
1
Usage for speech separation and temporal audio features

#29 saveriyo opened 2 months ago
4
Traning on wenetspeech couldn‘t converge

#28 dyyoungg opened 2 months ago
3
Comparison with Whisper

#27 isruihu opened 2 months ago
1
support lightning 2.x or above

#26 nukes opened 2 months ago
0
What is the difference between the config for training WavTokenizer-small and WavTokenizer-large?

#25 handsomelys opened 2 months ago
2
Fix DAC training

#24 erogol opened 2 months ago
0
WavTokenizer-mdium is release on 2024.09.09

#23 jishengpeng opened 2 months ago
4
Alignment language vocabulary and speech space

#22 varfolomeeff opened 2 months ago
3
Future 48kHz model

#21 Ronsor opened 2 months ago
1
The loss value when the model converges

#20 yangyyt opened 2 months ago
10
encounter shape inconsistent in training 16kHz

#19 dyyoungg closed 2 months ago
3
Mel or wav？

#18 howitry opened 2 months ago
1
Purpose of os.environ['CUDA_LAUNCH_BLOCKING'] = '1' in train.py

#17 seastar105 closed 2 months ago
1
Weight of model

#16 JoyceMind opened 2 months ago
1
Please consider about 16K model?

#15 ywh-my opened 2 months ago
1
Upgrade to Pytorch Lightning 2.0+ and make pip installable

#14 saveriyo opened 2 months ago
2
About infer in GPU

#13 JohnFengNeumann opened 2 months ago
1
fail to install

#12 JoyceMind opened 2 months ago
2
Installable package

#11 Tomiinek opened 2 months ago
0
MRD vs MS-STFTD

#10 Yagelmx opened 2 months ago
4
Convert to package and add libritts data prep script

#9 saveriyo closed 2 months ago
2
encode and decode for "16k sample"

#8 sunnnnnnnny closed 2 months ago
1
Some notes on HF integration

#7 NielsRogge opened 2 months ago
1
About ASR

#6 wntg opened 2 months ago
4