karpathy / build-nanogpt

Video+code lecture on building nanoGPT from scratch
3.61k stars 502 forks source link

TTS #67

Open yukiarimo opened 3 months ago

yukiarimo commented 3 months ago

Hello. Do you know how to turn this: https://github.com/nivibilla/build-nanogpt into TTS instead of audio-to-audio?

Momnadar1 commented 3 months ago

Hey @yukiarimo , I am trying todo that too, is there any progress on you side on this? I made some progress on audio to audio

if you are interested to work on it with me, let me know.

thanks

Momnadar1 commented 3 months ago

So, I also found 2 things

  1. STT https://keras.io/examples/audio/transformer_asr/
  2. TTS https://github.com/tttzof351/SimpleTransfromerTTS

Enjoy!! :)

yukiarimo commented 3 months ago

Gonna try it out! But how is that “without tokenizer”?

Momnadar1 commented 3 months ago

I think you are talking about audio-to-audio, so for that I build my own tokenizer hehe :'D

Momnadar1 commented 3 months ago

So, the concept behind the tokenizer is batches of data. Convert the combined audio say for 50MB for now; to mel spectrogram, encode the mel spectrogram into a sequence of integers and decode the sequence of integers back into the mel spectrogram. The mel spectrogram values are scaled and quantized to a range of integers. The encoding and decoding process maps these integers back and forth between the mel spectrogram values.

and in more general words, like at sec 1 we have encoded some kind of Mel spectrogram data. like we had for:

input: print(encode("hii there"))
output: [46, 47, 47, 1, 58, 46, 43, 56, 43]
input: print(decode(encode("hii there")))
output: hii there

Let me know if you can contribute on top of this, thanks.

yukiarimo commented 3 months ago

@Momnadar1 https://github.com/tttzof351/SimpleTransfromerTTS doesn't work

Momnadar1 commented 3 months ago

I will send you the Colab link on this, where it’s working for me . Thanks

Momnadar1 commented 3 months ago

Hi, @yukiarimo here is the link: https://colab.research.google.com/drive/1NHFi8y1GCIUR4Nv0yguGVwOk2q0-JOEu?usp=sharing.

But take a look on attached images of train and test loss etc on this https://github.com/tttzof351/SimpleTransfromerTTS. It shows you nearly take 400K iteration to generate good results.

If still issues just let me know.

Thanks,