ptwz / python_wizard

Command line LPC analysis tool to generate bitstreams for the Texas Instruments TMS5220 chip
MIT License
40 stars 13 forks source link

RuntimeWarning: invalid value encountered in true_divide possibly triggered by silence #9

Closed kevinjwalters closed 4 years ago

kevinjwalters commented 4 years ago

I was struggling to get python_wizard to work as it was spitting out errors about RuntimeWarning: invalid value encountered in true_divide on the 8kHz, 16bit signed wav file I was trying to read.

I eventually heavily trimmed it so there was no trailing quiet or silent part and that seems to have fixed it and produced output that (with -S option) works in Talkie equivalent code. The results were almost too good, I need a more robotic voice!!

I'll put more details here as I discovered them, either -d for debug or some fiddling around with pdb showed a suspicious array full of zeros and maybe a few NaNs.

It would be great if there were sample sample wavs together with their expected output in the repository. Those could even be part of some simple tests.

ptwz commented 4 years ago

Hmm, could you provide a test case file? Maybe this way the bug can be tracked down and a regression test made.

I created some synthetic test tone files for my own development use but thought this would make the repo unnecessarily big. Will add them soon.

kevinjwalters commented 4 years ago

I started looking at the code and discovered the useful, voluminous -d but didn't get very far with tracking this down myself. I need to check my notes but it might be when it fails to detect a pitch.

On another sample I found specifying -r 150,440 helped. The sample featured a fairly high pitch voice. I'll find a good really short sample that does it...

BTW, I've not noticed any unvoiced frames output. Do you get any?

Unrelated, do you know if the classic artificial robotic sound (like Speak and Spell) is related to use of the repeat frame? My clips don't sound as I hoped, they seem too natural and have some aurally distasteful artefacts.

kevinjwalters commented 4 years ago

Here's a short 8k file that does it. The 44.1 version also does it. I also did some aggressive filtering to get rid of stuff below 170Hz and that didn't help.

python_wizard-issue8repro-samples-ii.zip

$ python_wizard -d -f hex city-80.mono.wav 2>&1 > city-80.mono.defaults.hex | fgrep -10 -i nan
DEBUG:root:estimate=109.5876296796643 maximumMultiple=3.333333333333333
DEBUG:root:estimate=109.5876296796643 maximumMultiple=3.333333333333333
DEBUG:root:estimate=109.5876296796643 maximumMultiple=2.333333333333333
DEBUG:root:estimate=109.5876296796643 maximumMultiple=2.333333333333333
DEBUG:root:estimate=109.5876296796643 maximumMultiple=1.333333333333333
DEBUG:root:getCoefficientsFor minimumPeriod=14 maximumPeriod=162
site-packages/numpy/lib/function_base.py:2530: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]
site-packages/numpy/lib/function_base.py:2531: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[None, :]
DEBUG:root:_normalizedCoefficients = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5170102000657032, 0.45867486390914697, 0.3239796629312128, 0.13402921049365604, 0.07090324335199863, 0.23982368278753177, 0.3334899166607278, 0.3380201657991088, 0.26094889571774493, 0.12390205795063852, 0.04051809127513235, 0.19340465902073098, 0.30009579398493647, 0.3401482071433766, 0.30936744026667856, 0.21663633009823086, 0.08180897185503819, 0.06519047430323419, 0.1892897766683211, 0.2609827843546449, 0.26593262842535303, 0.20691962489836163, 0.10043243032206493, 0.027440633397322212, 0.14577478586798936, 0.22581794836218724, 0.24837452867979976, 0.20777331396607066, 0.1117668041268348, 0.02047085512504535, 0.16147805837193466, 0.28099489405735156, 0.3535109724848864, 0.36461170961584355, 0.3134471825060895, 0.211942901003322, 0.08235474861659493, 0.04684443252042178, 0.14708346254055404, 0.19618054674697513, 0.18265858762334639, 0.10723945661224175, 0.017422235468691365, 0.16850714530439934, 0.3172892339697641, 0.43568999222931326, 0.502534909524262, 0.5070810260415495, 0.4499615773286374, 0.3428225190579967, 0.20657713760420202, 0.06732292373789128, 0.04946592861890344, 0.12406697873128486, 0.14543150471441466, 0.11248052581703982, 0.03441841342417997, 0.07104857533139078, 0.18269930759454114, 0.2820699045943071, 0.3573216973698656, 0.40337420195249957, 0.41990978832352427, 0.40921527858771933, 0.37468716853143935, 0.31989046400666854, 0.2474254416365699, 0.1536748033783418, 0.17555579693253098, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
DEBUG:root:bestPeriod=14 minimumPeriod=15 maximumPeriod=161
DEBUG:root:estimate=13.736047152610666 maximumMultiple=1.0
DEBUG:root:HammingWindow: Generate window for len 400
DEBUG:root:getCoefficientsFor max(self.samples)=0.0040583170488474075
DEBUG:root:getCoefficientsFor max(self.samples)=0.0026044664641055376
DEBUG:root:getCoefficientsFor max(self.samples)=0.0024108542957671907
DEBUG:root:getCoefficientsFor max(self.samples)=0.020008754025375684
DEBUG:root:getCoefficientsFor max(self.samples)=0.12661870665699593
DEBUG:root:getCoefficientsFor max(self.samples)=0.15316428303184407
DEBUG:root:getCoefficientsFor max(self.samples)=0.1190564398025632
$ python_wizard -d -f hex city-441.mono.170hp.compressed100ms.wav 2>&1 > /dev/null | fgrep -10 -i nan
DEBUG:root:estimate=109.5739451852529 maximumMultiple=3.333333333333333
DEBUG:root:estimate=109.5739451852529 maximumMultiple=3.333333333333333
DEBUG:root:estimate=109.5739451852529 maximumMultiple=2.333333333333333
DEBUG:root:estimate=109.5739451852529 maximumMultiple=2.333333333333333
DEBUG:root:estimate=109.5739451852529 maximumMultiple=1.333333333333333
DEBUG:root:getCoefficientsFor minimumPeriod=14 maximumPeriod=162
site-packages/numpy/lib/function_base.py:2530: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]
site-packages/numpy/lib/function_base.py:2531: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[None, :]
DEBUG:root:_normalizedCoefficients = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5139113529688061, 0.4469656892843121, 0.29366830620144174, 0.08323651940394898, 0.13168925275232252, 0.2973270422677736, 0.3808161514594551, 0.37251318062269834, 0.2799568002811277, 0.12621042685898018, 0.05139045823096631, 0.2104616153295089, 0.3170860268271153, 0.3526550204506315, 0.3129819718726886, 0.2070216854172368, 0.05806714389245602, 0.0979615102519506, 0.22245630609332204, 0.28711109461712353, 0.2803425300981981, 0.20667283375996331, 0.08444071191690394, 0.057068551898760145, 0.18376886246928223, 0.2657509977610142, 0.28441067091966, 0.23510127220640656, 0.12709520017477105, 0.017611312894483856, 0.16807428193113122, 0.2918196818983245, 0.3631373396556181, 0.36836189878750464, 0.30757980910433336, 0.19436074675469894, 0.053592544776744126, 0.0837062819663987, 0.18768456935706207, 0.2359650286365577, 0.21720561686965573, 0.13274796616039278, 0.002970690960020381, 0.16452724897806414, 0.32118355462878007, 0.4441536658384961, 0.5121726863904789, 0.5144274312123001, 0.4520261271342799, 0.338203004984155, 0.19584316588527023, 0.052074261710987564, 0.06770426149793378, 0.14399647618970055, 0.16575243574818327, 0.13242670348694915, 0.05467580563445154, 0.04868720065233166, 0.15713707339108607, 0.2542498050800692, 0.3298466312963682, 0.37907048973745994, 0.40045852875314086, 0.3942563084305014, 0.36107980793640276, 0.2999170128351122, 0.19853184684796515, 0.23160235630560436, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
DEBUG:root:bestPeriod=61 minimumPeriod=15 maximumPeriod=161
DEBUG:root:estimate=60.534872913829204 maximumMultiple=4.066666666666666
DEBUG:root:estimate=60.534872913829204 maximumMultiple=4.066666666666666
DEBUG:root:estimate=60.534872913829204 maximumMultiple=4.066666666666666
DEBUG:root:estimate=60.534872913829204 maximumMultiple=4.066666666666666
DEBUG:root:estimate=60.534872913829204 maximumMultiple=3.0666666666666664
DEBUG:root:estimate=60.534872913829204 maximumMultiple=3.0666666666666664
DEBUG:root:estimate=60.534872913829204 maximumMultiple=3.0666666666666664
DEBUG:root:estimate=60.534872913829204 maximumMultiple=2.0666666666666664
DEBUG:root:estimate=60.534872913829204 maximumMultiple=2.0666666666666664
ptwz commented 4 years ago

Thanks for the wave files. I will try to build a testing rig for these and some other files I used along the way.

It seems the math behind the numpy corrcoef function was not the same as in the objective C implementation of the original code. Therefore silent periods would lead to division by zero. This should now be fixed.

kevinjwalters commented 4 years ago

Thanks.

BTW, I played around with some other encoding s/w at weekend and I've discovered fixing the pitch on playback seems to be the main ingredient for creating a robotic voice. Duplicating the output bytes to replicate the zero-order hold reconstruction also gives a distinctive "crisp" sound. And having a very clean recording of the voice helps massively.

deladriere commented 4 years ago

Hi, @kevinjwalters Thanks for this clarification. I also force the pitch to a fixed value to have a robotic voice on my speech synth. Can you explain what/how you do "Duplicating the output bytes to replicate the zero-order hold reconstruction also gives a distinctive "crisp" sound." Thanks!

kevinjwalters commented 4 years ago

@deladriere I only need to do this because I've converted the speech back to wav files. My messy workflow is

  1. original speech
  2. convert to 8k 16bit mono and use qboxpro or convert to mono and use python_wizard
  3. play back the TMS5220 output with the a modified Talkie library to produce raw samples (I'm actually used my own python version of the adafruit fork of the C Talkie library). I've added an option to my code to fix the pitch for voiced frames.
  4. duplicate those raw samples x4 to get 32k (this is a form of interpolation-less upsampling)
  5. use sox to convert this 32k raw 8bit to wav.

The 4th step is what I was referring to. The original devices that played these samples would just be sending them at 8kHz through a DAC to a speaker often with no or very limited filtering of high frequencies. You can see on https://en.wikipedia.org/wiki/Zero-order_hold that this gives them a visually blocky appearance. It creates a lot of high frequency artefacts which should not be there but produce the sound we are more familiar with. You only need to do this if you are creating wavs or use software that tries to reproduce the sound more correctly. In my case if I didn't do that the 8k wavs would playback on a PC without high frequencies - they would sound like they were passing over a tradtional (landline) telephone where the underlying 8k data rate caps the maximum audible frequency at 4kHz.

deladriere commented 4 years ago

Ah thanks a lot for this clarification ! I am playing the files on a modified Talkie library (to add volume, bend, pitch, stretch) on my speech synth (SAMD21 Arduino compatible) (see https://www.polaxis.be/) Do you plan to share you Python port  of the Talkie library ? I’d like to hear how it sound on a Raspberry Pi Zero Le 10 juin 2020 à 15:07 +0200, kevinjwalters notifications@github.com, a écrit :

@deladriere I only need to do this because I've converted the speech back to wav files. My messy workflow is

  1. original speech
  2. convert to 8k 16bit mono and use qboxpro or convert to mono and use python_wizard
  3. play back the TMS5220 output with the a modified Talkie library to produce raw samples (I'm actually used my own python version of the adafruit fork of the C Talkie library)
  4. duplicate those raw samples x4 to get 32k (this is a form of interpolation-less upsampling)
  5. use sox to convert this 32k raw 8bit to wav.

The 4th step is what I was referring to. The original devices that played these samples would just be sending them at 8kHz through a DAC to a speaker often with no or very limited filtering of high frequencies. You can see on https://en.wikipedia.org/wiki/Zero-order_hold that this gives them a visually blocky appearance. It creates a lot of high frequency artefacts which should not be there but produce the sound we are more familiar with. You only need to do this if you are creating wavs or use software that tries to reproduce the sound more correctly. In my case if I didn't do that the 8k wavs would playback on a PC without high frequencies - they would sound like they were passing over a tradtional (landline) telephone where the underlying 8k data rate caps the maximum audible frequency at 4kHz. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

kevinjwalters commented 4 years ago

@deladriere What I have at the moment is more likely to frustrate other users but I'll try and get back to this in a few weeks to tidy it up and release it in some form. The SAMD21 is a nice little chip but I don't think CircuitPython and the like would be capable of realtime playback. You can see (hear) it running on https://www.youtube.com/watch?v=1mcJz0RG5Jw with some of the standard Talkie library examples. It generates the samples and glues some of them together for playback then plays the sample, hence the long pauses. I think I didn't have enough memory to glue together "X" and "Press" to properly make an "Ex-press"!