Closed diyism closed 1 year ago
@diyism From a quick glance at the project, It seems to me this is a 3d visualization library for spectrograms.
The standard visualization used in python libraries (e.g. librosa) maps the time domain to the x-axis and the frequency domain to the y-axis. Each point/matrix cell is then color-coded according to the magnitude of the given frequency at that time interval.
The chrome music lab simply adds a z-axis showing the same magnitude to create a 3d chart. Unless I am mistaken, it does not use any additional information to make the visualizations.
Thanks, it seems the chrome music lab doesn't create new dimension of the voice spectrogram, I closes this issue.
I'm very interested in phoneme/syllable recognition, and I'm trying YonaVox's preprocessing and training jupyter notebooks.
I find the chrome-music-lab's voice spectrogram is very impressive, for example when I speak the 4 syllables of /gə tə kə hə/ to the android phone web page(https://musiclab.chromeexperiments.com/spectrogram/), the spectrogram is as below: I can clearly see the 8 phonemes(/g ə t ə k ə h ə/) in the spectrogram.
So I guess if we can migrate the code (https://github.com/googlecreativelab/chrome-music-lab/blob/master/spectrogram/src/javascripts/UI/spectrogram.js) into this project, maybe it will improve the phoneme recognition accuracy.
And from the music-lab's spectrogram, I guess maybe we could find a way to recognize a syllable's beginning just with every syllable's first 0.1 seconds, along with its following 0.2 seconds' audio stream, every syllable could be force aligned as 0.3 seconds. I find a jupyter notebook which do inference of syllable and phoneme onsets(https://www.isca-speech.org/archive_v0/Interspeech_2018/pdfs/1224.pdf), but it's not too generalized.