issues
search
jishengpeng
/
WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
800
stars
44
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Semantic Representation
#55
Uneasy-Z
opened
1 week ago
2
worse performance of large model compared to small model?
#54
XiaoshanHsj
opened
1 week ago
4
Training for wav to midi transcriper
#53
mito0o852
opened
1 week ago
1
Model can not converge
#52
VJJJJJJ1
closed
1 week ago
2
What is the expected behaviour with changing the bandwidth parameter
#51
tanmaylaud
opened
2 weeks ago
0
Questions for Creating a Better Model
#50
ootsuka-repos
opened
2 weeks ago
2
some questions about model
#49
VJJJJJJ1
closed
1 week ago
2
Installable Package
#48
poonehmousavi
opened
2 weeks ago
2
Config/Model Checkpoint Pairing
#47
MorenoLaQuatra
opened
1 month ago
5
Question about Audio Preprocessing
#46
xjf-303
opened
1 month ago
1
We update WavTokenizer paper in Arxiv and release WavTokenizer-Large checkpoint in Huggingface on 2024.10.22
#45
jishengpeng
opened
1 month ago
4
Streaming infer
#44
wntg
opened
1 month ago
3
How many training steps to train wavtokenizer?
#43
sphmel
closed
1 month ago
1
When will the large unify model (speech, music, audio) be released?
#42
MrPig
opened
1 month ago
1
probability density for each index in the codebook
#41
goforher
opened
1 month ago
1
Performance in LLM-based-TTS
#40
Liujingxiu23
opened
2 months ago
2
Files Missing?
#39
goforher
opened
2 months ago
6
why grad norm is so high?
#38
necrophagists
opened
2 months ago
1
speech medium v2
#37
theodorblackbird
closed
2 months ago
1
Why so large commit loss weight
#36
Ming-er
opened
2 months ago
4
how to train the model with Token/s about 23, that is hopsize=1024
#35
Liujingxiu23
opened
2 months ago
5
CER Performance of Reconstructed Audio
#34
howitry
opened
2 months ago
6
Using EMA on the generator markedly improves the validation loss
#33
erogol
opened
2 months ago
2
Question about training
#32
handsomelys
opened
2 months ago
2
Maximum duration supported during inference?
#31
LiuShixing
opened
2 months ago
1
How many hours of Chinese data are there?
#30
LiuShixing
closed
2 months ago
1
Usage for speech separation and temporal audio features
#29
saveriyo
opened
2 months ago
4
Traning on wenetspeech couldn‘t converge
#28
dyyoungg
opened
2 months ago
3
Comparison with Whisper
#27
isruihu
opened
2 months ago
1
support lightning 2.x or above
#26
nukes
opened
2 months ago
0
What is the difference between the config for training WavTokenizer-small and WavTokenizer-large?
#25
handsomelys
opened
2 months ago
2
Fix DAC training
#24
erogol
opened
2 months ago
0
WavTokenizer-mdium is release on 2024.09.09
#23
jishengpeng
opened
2 months ago
4
Alignment language vocabulary and speech space
#22
varfolomeeff
opened
2 months ago
3
Future 48kHz model
#21
Ronsor
opened
2 months ago
1
The loss value when the model converges
#20
yangyyt
opened
2 months ago
10
encounter shape inconsistent in training 16kHz
#19
dyyoungg
closed
2 months ago
3
Mel or wav?
#18
howitry
opened
2 months ago
1
Purpose of os.environ['CUDA_LAUNCH_BLOCKING'] = '1' in train.py
#17
seastar105
closed
2 months ago
1
Weight of model
#16
JoyceMind
opened
2 months ago
1
Please consider about 16K model?
#15
ywh-my
opened
2 months ago
1
Upgrade to Pytorch Lightning 2.0+ and make pip installable
#14
saveriyo
opened
2 months ago
2
About infer in GPU
#13
JohnFengNeumann
opened
2 months ago
1
fail to install
#12
JoyceMind
opened
2 months ago
2
Installable package
#11
Tomiinek
opened
2 months ago
0
MRD vs MS-STFTD
#10
Yagelmx
opened
2 months ago
4
Convert to package and add libritts data prep script
#9
saveriyo
closed
2 months ago
2
encode and decode for "16k sample"
#8
sunnnnnnnny
closed
2 months ago
1
Some notes on HF integration
#7
NielsRogge
opened
2 months ago
1
About ASR
#6
wntg
opened
2 months ago
4
Next