yonatankarimish / YonaVox

Machine learning speech recognition for air conditioner voice activation
5 stars 2 forks source link

[Improvement] chrome-music-lab's spectrogram is very impressive, maybe we can migrate it into this project #1

Closed diyism closed 1 year ago

diyism commented 1 year ago

I'm very interested in phoneme/syllable recognition, and I'm trying YonaVox's preprocessing and training jupyter notebooks.

I find the chrome-music-lab's voice spectrogram is very impressive, for example when I speak the 4 syllables of /gə tə kə hə/ to the android phone web page(https://musiclab.chromeexperiments.com/spectrogram/), the spectrogram is as below: a I can clearly see the 8 phonemes(/g ə t ə k ə h ə/) in the spectrogram.

So I guess if we can migrate the code (https://github.com/googlecreativelab/chrome-music-lab/blob/master/spectrogram/src/javascripts/UI/spectrogram.js) into this project, maybe it will improve the phoneme recognition accuracy.

And from the music-lab's spectrogram, I guess maybe we could find a way to recognize a syllable's beginning just with every syllable's first 0.1 seconds, along with its following 0.2 seconds' audio stream, every syllable could be force aligned as 0.3 seconds. I find a jupyter notebook which do inference of syllable and phoneme onsets(https://www.isca-speech.org/archive_v0/Interspeech_2018/pdfs/1224.pdf), but it's not too generalized.

yonatankarimish commented 1 year ago

@diyism From a quick glance at the project, It seems to me this is a 3d visualization library for spectrograms.

The standard visualization used in python libraries (e.g. librosa) maps the time domain to the x-axis and the frequency domain to the y-axis. Each point/matrix cell is then color-coded according to the magnitude of the given frequency at that time interval.

The chrome music lab simply adds a z-axis showing the same magnitude to create a 3d chart. Unless I am mistaken, it does not use any additional information to make the visualizations.

diyism commented 1 year ago

Thanks, it seems the chrome music lab doesn't create new dimension of the voice spectrogram, I closes this issue.