jasperproject / jasper-client

Client code for Jasper voice computing platform
MIT License
4.53k stars 1.01k forks source link

__init__() got an unexpected keyword argument 'hmm' and path issues with pocketsphinx #712

Open BruceJohnJennerLawso opened 5 years ago

BruceJohnJennerLawso commented 5 years ago

So I am trying to get jasper working with pocketsphinx and espeak on an Orange Pi board running Ubuntu 16.04. For all intents and purposes the board has run like any ordinary Ubuntu install, save with ARM packages.

Because Im on Ubuntu, Im able to skip the whole build process for pocketsphinx and install pocketsphinx and python-pocketsphinx through apt. I then installed the rest of the dependencies as needed, but Im still having issues getting jasper to work.

The first issue seems to be with parsing profile.yml. When I try to manually specify options for CMUSphinx hmm_dir and fst_model as described at

http://jasperproject.github.io/documentation/configuration/

I get

ScannerError: mapping values are not allowed here

Which seems to happen whenever I indent a line in profile.yml. If I dont indent the lines for hmm_dir and fst_model, they seem to just be ignored.

I tried to continue on leaving those options blank, but the hmm_dir that apt installs pocketsphinx with seems to have changed. Jasper seems to think its located at

/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k

but as far as I can tell its actually located at

/usr/share/pocketsphinx/model/en-us/en-us

Jasper manages to get past that issue when I manually edited the hmm path in client/stt.py, but now crashes somewhere in stt.py with

File "./jasper.py", line 146, in app = Jasper() File "./jasper.py", line 109, in init stt_passive_engine_class.get_passive_instance(), File "/home/john/dev/jasper/client/stt.py", line 48, in get_passive_instance return cls.get_instance('keyword', phrases) File "/home/john/dev/jasper/client/stt.py", line 42, in get_instance instance = cls(config) File "/home/john/dev/jasper/client/stt.py", line 129, in init vocabulary.decoder_kwargs) TypeError: init() got an unexpected keyword argument 'hmm'

G10DRAS commented 5 years ago

do not use TAB while indent a line in profile.yml, instead use two SPACE and see if that solve your issue.

BruceJohnJennerLawso commented 5 years ago

I tried exactly two spaces for indenting, but that caused the crash as before. My profile.yml:

carrier: ''

first_name: John

gmail_address: ##################

gmail_password: ##################

last_name: Lawson

phone_number: ''

prefers_email: true

stt_engine: sphinx

hmm_dir: '/usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k' #optional

timezone: America/Toronto

and it still crashes with


  • JASPER - THE TALKING COMPUTER *
  • (c) 2015 Shubhro Saha, Charlie Marsh & Jan Holthuis *

    ERROR:root:Error occured! Traceback (most recent call last): File "jasper.py", line 146, in app = Jasper() File "jasper.py", line 80, in init self.config = yaml.safe_load(f) File "/usr/local/lib/python2.7/dist-packages/yaml/init.py", line 93, in safe_load return load(stream, SafeLoader) File "/usr/local/lib/python2.7/dist-packages/yaml/init.py", line 71, in load return loader.get_single_data() File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 37, in get_single_data node = self.get_single_node() File "/usr/local/lib/python2.7/dist-packages/yaml/composer.py", line 36, in get_single_node document = self.compose_document() File "/usr/local/lib/python2.7/dist-packages/yaml/composer.py", line 55, in compose_document node = self.compose_node(None, None) File "/usr/local/lib/python2.7/dist-packages/yaml/composer.py", line 84, in compose_node node = self.compose_mapping_node(anchor) File "/usr/local/lib/python2.7/dist-packages/yaml/composer.py", line 127, in compose_mapping_node while not self.check_event(MappingEndEvent): File "/usr/local/lib/python2.7/dist-packages/yaml/parser.py", line 98, in check_event self.current_event = self.state() File "/usr/local/lib/python2.7/dist-packages/yaml/parser.py", line 428, in parse_block_mapping_key if self.check_token(KeyToken): File "/usr/local/lib/python2.7/dist-packages/yaml/scanner.py", line 116, in check_token self.fetch_more_tokens() File "/usr/local/lib/python2.7/dist-packages/yaml/scanner.py", line 220, in fetch_more_tokens return self.fetch_value() File "/usr/local/lib/python2.7/dist-packages/yaml/scanner.py", line 580, in fetch_value self.get_mark()) ScannerError: mapping values are not allowed here in "/home/john/.jasper/profile.yml", line 9, column 10

BruceJohnJennerLawso commented 5 years ago

(I used two spaces to start the hmmdir line, githubs formatting just isnt showing it)

G10DRAS commented 5 years ago

See configuration here

https://jasperproject.github.io/documentation/configuration/#pocketsphinx-stt

And validate your yaml here

http://www.yamllint.com/

BruceJohnJennerLawso commented 5 years ago

Oh shoot ok I see the issue with the indenting, thats fixed now

BruceJohnJennerLawso commented 5 years ago

I reverted the changed path that I tried to change by hardcoding in client/stt.py, but jasper still crashes:


Im not clear on what the issue is, but I wonder if the fact that the new path to the hmm dir does not contain a folder named hmm in its path might be throwing something off? FWIW, when I ls the directory that Im setting hmm_dir to, the output is

usr@server:/usr/share/pocketsphinx/model/en-us/en-us$ ls feat.params mdef means noisedict README sendump transition_matrices variances

Which is what should be in the hmm dir afaik

BruceJohnJennerLawso commented 5 years ago

I double checked with a fresh clone of jasper and the crash still happens. The issue seems to be between Jasper and possibly the new directory location for hmm_dir

G10DRAS commented 5 years ago

I think you missed pocketsphinx: see below config

stt_engine: sphinx pocketsphinx: hmm_dir: '/usr/share/pocketsphinx/model/en-us/en-us'

BruceJohnJennerLawso commented 5 years ago

I dont think thats the issue, my current profile.yml is

carrier: '' first_name: John gmail_address: ######## gmail_password: ######## last_name: Lawson phone_number: '' prefers_email: true stt_engine: sphinx pocketsphinx: hmm_dir: '/usr/share/pocketsphinx/model/en-us/en-us' #optional timezone: America/Toronto

mecparts commented 5 years ago

You're likely using pocketsphinx8-5prealpha. There are some API changes, as detailed in this message from the support forum ERROR:root:Error occured! init() got an unexpected keywork argument 'hmm'. For reference's sake, here's the changes I have in my (working with 5prealpha) stt.py

--- stt.py.orig 2019-05-11 21:22:41.494476257 -0600
+++ stt.py  2019-05-03 19:32:06.394275693 -0600
@@ -122,8 +122,14 @@
                                  "hmm_dir in your profile.",
                                  hmm_dir, ', '.join(missing_hmm_files))

-        self._decoder = ps.Decoder(hmm=hmm_dir, logfn=self._logfile,
-                                   **vocabulary.decoder_kwargs)
+        #self._decoder = ps.Decoder(hmm=hmm_dir, logfn=self._logfile,
+        #                           **vocabulary.decoder_kwargs)
+        psConfig = ps.Decoder.default_config()
+        psConfig.set_string('-hmm', hmm_dir)
+
+        psConfig.set_string('-lm', vocabulary.decoder_kwargs['lm'])
+        psConfig.set_string('-dict', vocabulary.decoder_kwargs['dict'])
+        self._decoder = ps.Decoder(psConfig)

     def __del__(self):
         os.remove(self._logfile)
@@ -163,13 +169,19 @@
         self._decoder.process_raw(data, False, True)
         self._decoder.end_utt()

-        result = self._decoder.get_hyp()
+        #result = self._decoder.get_hyp()
+        result = self._decoder.hyp()
         with open(self._logfile, 'r+') as f:
-            for line in f:
-                self._logger.debug(line.strip())
+        #    for line in f:
+        #        self._logger.debug(line.strip())
             f.truncate()

-        transcribed = [result[0]]
+        #transcribed = [result[0]]
+        if result is None:
+            transcribed = ''
+        else:
+            transcribed = result.hypstr.split()
+
         self._logger.info('Transcribed: %r', transcribed)
         return transcribed
BruceJohnJennerLawso commented 5 years ago

Hey mecparts, I gave the changes you described a try but Im still crashing. Im having a little bit of trouble following the changes you made, the last part generating that transcribed variable and returning it is supposed to go in the del function ?

BruceJohnJennerLawso commented 5 years ago

When I ran jasper with the changes you described it crashes with

`***

mecparts commented 5 years ago

No, the last changes are in the transcribe function. Look at the line numbers in the @@ lines of the diff.

mecparts commented 5 years ago

Replace the __init__ function in the PocketSphinxSTT class with this code:

    def __init__(self, vocabulary, hmm_dir="/usr/local/share/" +
                 "pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k"):

        """
        Initiates the pocketsphinx instance.

        Arguments:
            vocabulary -- a PocketsphinxVocabulary instance
            hmm_dir -- the path of the Hidden Markov Model (HMM)
        """

        self._logger = logging.getLogger(__name__)

        # quirky bug where first import doesn't work
        try:
            import pocketsphinx as ps
        except:
            import pocketsphinx as ps

        with tempfile.NamedTemporaryFile(prefix='psdecoder_',
                                         suffix='.log', delete=False) as f:
            self._logfile = f.name

        self._logger.debug("Initializing PocketSphinx Decoder with hmm_dir " +
                           "'%s'", hmm_dir)

        # Perform some checks on the hmm_dir so that we can display more
        # meaningful error messages if neccessary
        if not os.path.exists(hmm_dir):
            msg = ("hmm_dir '%s' does not exist! Please make sure that you " +
                   "have set the correct hmm_dir in your profile.") % hmm_dir
            self._logger.error(msg)
            raise RuntimeError(msg)
        # Lets check if all required files are there. Refer to:
        # http://cmusphinx.sourceforge.net/wiki/acousticmodelformat
        # for details
        missing_hmm_files = []
        for fname in ('mdef', 'feat.params', 'means', 'noisedict',
                      'transition_matrices', 'variances'):
            if not os.path.exists(os.path.join(hmm_dir, fname)):
                missing_hmm_files.append(fname)
        mixweights = os.path.exists(os.path.join(hmm_dir, 'mixture_weights'))
        sendump = os.path.exists(os.path.join(hmm_dir, 'sendump'))
        if not mixweights and not sendump:
            # We only need mixture_weights OR sendump
            missing_hmm_files.append('mixture_weights or sendump')
        if missing_hmm_files:
            self._logger.warning("hmm_dir '%s' is missing files: %s. Please " +
                                 "make sure that you have set the correct " +
                                 "hmm_dir in your profile.",
                                 hmm_dir, ', '.join(missing_hmm_files))

        #self._decoder = ps.Decoder(hmm=hmm_dir, logfn=self._logfile,
        #                           **vocabulary.decoder_kwargs)
        psConfig = ps.Decoder.default_config()
        psConfig.set_string('-hmm', hmm_dir)

        psConfig.set_string('-lm', vocabulary.decoder_kwargs['lm'])
        psConfig.set_string('-dict', vocabulary.decoder_kwargs['dict'])
        self._decoder = ps.Decoder(psConfig)

And the transcribe function in the same class with this:

    def transcribe(self, fp):
        """
        Performs STT, transcribing an audio file and returning the result.

        Arguments:
            fp -- a file object containing audio data
        """

        fp.seek(44)

        # FIXME: Can't use the Decoder.decode_raw() here, because
        # pocketsphinx segfaults with tempfile.SpooledTemporaryFile()
        data = fp.read()
        self._decoder.start_utt()
        self._decoder.process_raw(data, False, True)
        self._decoder.end_utt()

        #result = self._decoder.get_hyp()
        result = self._decoder.hyp()
        with open(self._logfile, 'r+') as f:
        #    for line in f:
        #        self._logger.debug(line.strip())
            f.truncate()

        #transcribed = [result[0]]
        if result is None:
            transcribed = []
        else:
            transcribed = [result[0]]
        self._logger.info('Transcribed: %r', transcribed)
        return transcribed

and see what that gets you. I can't guarantee it will work error free first time; I've modified the code I'm working with to return multiple hypotheses from PocketSphinx and to work with Mycroft's adapt.intent parser, among other things, so I can't really test it easily anymore. I like the fact that the adapt parser can assign probabilities to each hypothesis from PocketSphinx and that I can use those probabilities in the brain code to pick the best hypothesis to select a module (rather than Jasper's "use the first module that matches" approach).

azban commented 4 years ago

this looks to be an issue caused by using an updated pocketsphinx version (from pip), rather than the outdated version that the Jasper docs and code rely on. is there any plan to make changes required to support the new version?

appeacock commented 4 years ago

The work on Jasper - specifically making it work as-is and refactoring to Python 3 is being conducted at https://github.com/aplawson/jasper-client -- including a tutorial on how to build it and/or deploy it with a custom Raspbian ISO image. //adam