jasperproject / jasper-client

Client code for Jasper voice computing platform
MIT License
4.53k stars 1.01k forks source link

fixed exception error: not a whole number of frames in mic.py #636

Open vdm97 opened 7 years ago

vdm97 commented 7 years ago

My profile.yml:

sphinx pocketsphinx: fst_model: '/home/debian/Desktop/phonetisaurus/g014b2b.fst' hmm_dir: '/usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/'

tts_engine: flite-tts flite-tts: voice: 'slt'

keyword: 'Jasmin' language: 'en-US'

audio_engine: 'alsa' output_device: 'default-card-s2' input_device: 'default-card-s2'

audio: input_samplerate: 48000 input_samplewidth: 16 input_channels: 1 input_chunksize: 1024 output_chunksize: 1024 output_padding: 'no' # true, yes

Results in:

debian@beaglebone:~$ python /home/debian/Desktop/jasper-dev/Jasper.py


As one can see on the Beaglebone Black the audio stream quite often randomly closes (as mentioned in #404 by G10DRAS and in #529 by dev0v36 ). I fixed this exception (error: not a whole number of frames) in mic.py in wait_for_keyword by catching it and deleting the frame from frames. I tried to restart the whole method/function but it took too long to setup the two threads and one gets a new frame_queue which results in a really bad hotword detection (basically none because the exception gets raised every 3 seconds or so).

I could not do a pull request so i pushed it here: https://github.com/vdm97/jasmin/blob/master/jasper/mic.py

vdm97 commented 7 years ago

Also i improved the calculation of frames in active_listening and added some more exception handling in mic.py

jantos commented 7 years ago

Doesn't look like your mic.py is available anymore?!

blackde5ert commented 7 years ago

Hi, I just had the same problem. After hours of trying around I came to this solution: I added following line in the mic.py (line 161): if ((len(frame) % self._input_chunksize) == 0): The line calculates the modulo of the framelength and the chucksize and performs the rest of the loop only if the framelength is an exact multiple of the chucksize (only then is it a complete frame).

Then I changed the self._threshold = float(audioop.rms("".join(frames), 2)) in line 178 & 196 into self._threshold = float(audioop.rms(b''.join(frames), int(self._input_bits/8))) The second changes dont have to be made, but imo it's the nicer code.

Here the hole function _“wait_forkeyword” in mic.py:

def wait_for_keyword(self, keyword=None):
        if not keyword:
            keyword = self._keyword
        frame_queue = queue.Queue()
        keyword_uttered = threading.Event()

        # FIXME: not configurable yet
        num_worker_threads = 2

        for i in range(num_worker_threads):
            t = threading.Thread(target=self.check_for_keyword,
                                 args=(frame_queue, keyword_uttered, keyword))
            t.daemon = True
            t.start()

        frames = collections.deque([], 30)
        recording = False
        recording_frames = []
        self._logger.info("Waiting for keyword '%s'...", keyword)
        for frame in self._input_device.record(self._input_chunksize,
                                               self._input_bits,
                                               self._input_channels,
                                               self._input_rate):
            if ((len(frame) % self._input_chunksize) == 0):
            if keyword_uttered.is_set():
                self._logger.info("Keyword %s has been uttered", keyword)
                return
            frames.append(frame)
            if not recording:
            snr = self._snr([frame])
            if snr >= 10:  # 10dB
                # Loudness is higher than normal, start recording and use
                # the last 10 frames to start
                self._logger.debug("Started recording on device '%s'",
                                   self._input_device.slug)
                self._logger.debug("Triggered on SNR of %sdB", snr)
                recording = True
                recording_frames = list(frames)[-10:]
            elif len(frames) >= frames.maxlen:
                # Threshold SNR not reached. Update threshold with
                # background noise.
                self._threshold = float(audioop.rms(b''.join(frames), int(self._input_bits/8)))
            else:
                # We're recording
                recording_frames.append(frame)
                if len(recording_frames) > 20:
                    # If we recorded at least 20 frames, check if we're below
                    # threshold again
                    last_snr = self._snr(recording_frames[-10:])
                    self._logger.debug(
                        "Recording's SNR dB: %f", last_snr)
                    if last_snr <= 3 or len(recording_frames) >= 60:
                        # The loudness of the sound is not at least as high as
                        # the the threshold, or we've been waiting too long
                        # we'll stop recording now
                        recording = False
                        self._logger.debug("Recorded %d frames",
                                           len(recording_frames))
                        frame_queue.put(tuple(recording_frames))
                        self._threshold = float(audioop.rms(b''.join(frames), int(self._input_bits/8)))
danielhyt2 commented 6 years ago

I also got the same error. After applying the changes, the error disappeared. However, jasper now becomes deaf. It doesn't hear anything at all. It just stuck at this line:

INFO:jasper.mic:Waiting for keyword 'TIGER'...

If I put --debug, it stuck at these lines:

INFO:jasper.mic:Waiting for keyword 'TIGER'... DEBUG:alsa_1_0_0.alsaaudioengine:input stream opened on device 'plughw-card-device-dev-0' (16000 Hz, 1 channel, 16 bit)

blackde5ert commented 6 years ago

Sorry for the late answer. You can try adding debug-logs into the code (before if ((len(frame) % self._input_chunksize) == 0):) for verifying where the problem comes from. Add something like

self._logger.debug("Chunksize: %s", self._input_chunksize)

for chunksize and len(frame).

Then start with --debug again

danielhyt2 commented 6 years ago

Hi blackde5ert,

No worry, thank you for your reply. Since I've completely removed jasper (I'm trying lucida now), I'll try your advice after I've reinstalled jasper from scratch again.

danielhyt2 commented 6 years ago

Hi blackde5er,

I'm sorry for the late reply. Just got a chance to reinstall jasper-dev and made the change. After putting the debug log, here is what showing up:

DEBUG:jasper.mic:Chunksize: 1024 DEBUG:jasper.mic:Chunksize: 1024 ...

Actually I didn't get "not a whole number of frames" error but more like jasper isn't waiting for keyword utterance (it is once only), instead keeps showing:

INFO:sphinx_1_0_0.sphinxplugin:Transcribed: [] INFO:sphinx_1_0_0.sphinxplugin:Transcribed: [] INFO:sphinx_1_0_0.sphinxplugin:Transcribed: [] INFO:sphinx_1_0_0.sphinxplugin:Transcribed: [] INFO:sphinx_1_0_0.sphinxplugin:Transcribed: []

blackde5ert commented 6 years ago

Could you please post the whole output for starting Jasper with the --debug option? If Jasper already says “...Transcribed: []” it looks like the “wait_for_keyword”-function seems to work so far.

Are you sure your microphone is working correct?