prosodylab / Prosodylab-Aligner

Python interface for forced audio alignment using HTK and SoX
http://prosodylab.org/tools/aligner/
MIT License
331 stars 77 forks source link

Allow '@' as a possible phone (it should work with HTK, or so we think) #37

Closed mchlwgnr closed 9 years ago

kylebgorman commented 9 years ago

Will test, then can do.

If you have some time, perhaps you could consult the HTK book (available freely as a PDF) for what the spec says about valid spelling phones. I think that any non-numeric, non-control, non-white space character is valid in initial position, and digits are also valid in final (or non-initial, not sure) position, but I've just been enabling what I know works.

K

Sent from my phone

On Jan 7, 2015, at 8:56 AM, mchlwgnr notifications@github.com wrote:

— Reply to this email directly or view it on GitHub.

ghost commented 9 years ago

This is the error (rather a lot) when '@' was included in the phone set. It looks like yaml has issues with '@' appearing initially in tokens.

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main "main", mod_spec) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code exec(code, run_globals) File "../Prosodylab-Aligner/aligner/main.py", line 110, in archive = Archive(args.read) File "../Prosodylab-Aligner/aligner/archive.py", line 56, in init raise ValueError("'{}' is a bomb.".format(source)) ValueError: fr-QuEu.zip' is a bomb. Exception ignored in: ... Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main "main", mod_spec) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code exec(code, run_globals) File "../Prosodylab-Aligner/aligner/main.py", line 91, in opts = resolve_opts(args) File "../Prosodylab-Aligner/aligner/utilities.py", line 65, in resolve_opts opts = yaml.load(source) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/init.py", line 72, in load return loader.get_single_data() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/constructor.py", line 35, in get_single_data node = self.get_single_node() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 36, in get_single_node document = self.compose_document() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 55, in compose_document node = self.compose_node(None, None) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 84, in compose_node node = self.compose_mapping_node(anchor) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 133, in compose_mapping_node item_value = self.compose_node(node, item_key) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 82, in compose_node node = self.compose_sequence_node(anchor) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 110, in compose_sequence_node while not self.check_event(SequenceEndEvent): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/parser.py", line 98, in check_event self.current_event = self.state() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/parser.py", line 486, in parse_flow_sequence_entry if self.check_token(KeyToken): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/scanner.py", line 116, in check_token self.fetch_more_tokens() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/scanner.py", line 257, in fetch_more_tokens self.get_mark()) yaml.scanner.ScannerError: while scanning for the next token found character '@' that cannot start any token in "/../Prosodylab-Aligner/fra.yaml", line 10, column 16

kylebgorman commented 9 years ago

Does it work if you put it in quotes (single or double)? I think that should help.

Kyle

On Jan 7, 2015, at 9:40 AM, stargirl749 notifications@github.com wrote:

This is the error (rather a lot) when '@' was included in the phone set. It looks like yaml has issues with '@' appearing initially in tokens.

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in run_module_as_main "main_", mod_spec) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in run_code exec(code, run_globals) File "../Prosodylab-Aligner/aligner/main_.py", line 110, in archive = Archive(args.read) File "../Prosodylab-Aligner/aligner/archive.py", line 56, in init raise ValueError("'{}' is a bomb.".format(source)) ValueError: fr-QuEu.zip' is a bomb. Exception ignored in: ... Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in run_module_as_main "main_", mod_spec) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in run_code exec(code, run_globals) File "../Prosodylab-Aligner/aligner/main_.py", line 91, in opts = resolve_opts(args) File "../Prosodylab-Aligner/aligner/utilities.py", line 65, in resolve_opts opts = yaml.load(source) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/init.py", line 72, in load return loader.get_single_data() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/constructor.py", line 35, in get_single_data node = self.get_single_node() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 36, in get_single_node document = self.compose_document() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 55, in compose_document node = self.compose_node(None, None) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 84, in compose_node node = self.compose_mapping_node(anchor) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 133, in compose_mapping_node item_value = self.compose_node(node, item_key) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 82, in compose_node node = self.compose_sequence_node(anchor) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 110, in compose_sequence_node while not self.check_event(SequenceEndEvent): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/parser.py", line 98, in check_event self.current_event = self.state() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/parser.py", line 486, in parse_flow_sequence_entry if self.check_token(KeyToken): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/scanner.py", line 116, in check_token self.fetch_more_tokens() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/scanner.py", line 257, in fetch_more_tokens self.get_mark()) yaml.scanner.ScannerError: while scanning for the next token found character '@' that cannot start any token in "/../Prosodylab-Aligner/fra.yaml", line 10, column 16

— Reply to this email directly or view it on GitHub.

mchlwgnr commented 9 years ago

we're using @ with the old aligner for French, and it's working

On Wed, Jan 7, 2015 at 12:27 PM, Kyle Gorman notifications@github.com wrote:

Will test, then can do.

If you have some time, perhaps you could consult the HTK book (available freely as a PDF) for what the spec says about valid spelling phones. I think that any non-numeric, non-control, non-white space character is valid in initial position, and digits are also valid in final (or non-initial, not sure) position, but I've just been enabling what I know works.

K

Sent from my phone

On Jan 7, 2015, at 8:56 AM, mchlwgnr notifications@github.com wrote:

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69057517 .

kylebgorman commented 9 years ago

In YAML, you need to put any non-alphabetic character in quotes if you want them to be treated as strings (and you do, in this case).

YAML is a bit like Python in this regard, but unlike Python, it can also infer that alphabetic sequences are strings. To be safe, you could just put every phoneme in quotes.

Email me directly if you need a copy of my config file for French. It works with all the fancy characters from Lexique.

Kyle

On Jan 7, 2015, at 9:46 AM, mchlwgnr notifications@github.com wrote:

we're using @ with the old aligner for French, and it's working

On Wed, Jan 7, 2015 at 12:27 PM, Kyle Gorman notifications@github.com wrote:

Will test, then can do.

If you have some time, perhaps you could consult the HTK book (available freely as a PDF) for what the spec says about valid spelling phones. I think that any non-numeric, non-control, non-white space character is valid in initial position, and digits are also valid in final (or non-initial, not sure) position, but I've just been enabling what I know works.

K

Sent from my phone

On Jan 7, 2015, at 8:56 AM, mchlwgnr notifications@github.com wrote:

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69057517 .

— Reply to this email directly or view it on GitHub.

kylebgorman commented 9 years ago

FWIW, it doesn't look like @ is in French dictionary, see:

https://github.com/prosodylab/prosodylab-alignermodels/blob/master/FrenchQuEu/fr-QuEu/

There is one character that may need quoted, though: "^"

mchlwgnr commented 9 years ago

great, we'll try this (the only reason to stick to @ is how French is transcribed elsewhere--the current french model posted on alignermodels uses a different symbol just to make it work)

On Wed, Jan 7, 2015 at 1:20 PM, Kyle Gorman notifications@github.com wrote:

In YAML, you need to put any non-alphabetic character in quotes if you want them to be treated as strings (and you do, in this case).

YAML is a bit like Python in this regard, but unlike Python, it can also infer that alphabetic sequences are strings. To be safe, you could just put every phoneme in quotes.

Email me directly if you need a copy of my config file for French. It works with all the fancy characters from Lexique.

Kyle

On Jan 7, 2015, at 9:46 AM, mchlwgnr notifications@github.com wrote:

we're using @ with the old aligner for French, and it's working

On Wed, Jan 7, 2015 at 12:27 PM, Kyle Gorman notifications@github.com wrote:

Will test, then can do.

If you have some time, perhaps you could consult the HTK book (available freely as a PDF) for what the spec says about valid spelling phones. I think that any non-numeric, non-control, non-white space character is valid in initial position, and digits are also valid in final (or non-initial, not sure) position, but I've just been enabling what I know works.

K

Sent from my phone

On Jan 7, 2015, at 8:56 AM, mchlwgnr notifications@github.com wrote:

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub < https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69057517>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69065859 .

kylebgorman commented 9 years ago

Yea, let’s be as close as possible to the Lexique source.

Before I commit this fix, would it be possible for someone to send me or point me to data I could use to test? Specifically, I’d like to be able to test a dictionary that has “@“ and “^”, so a minimally processed Lexique would work.

On Jan 7, 2015, at 12:08 PM, mchlwgnr notifications@github.com wrote:

great, we'll try this (the only reason to stick to @ is how French is transcribed elsewhere--the current french model posted on alignermodels uses a different symbol just to make it work)

On Wed, Jan 7, 2015 at 1:20 PM, Kyle Gorman notifications@github.com wrote:

In YAML, you need to put any non-alphabetic character in quotes if you want them to be treated as strings (and you do, in this case).

YAML is a bit like Python in this regard, but unlike Python, it can also infer that alphabetic sequences are strings. To be safe, you could just put every phoneme in quotes.

Email me directly if you need a copy of my config file for French. It works with all the fancy characters from Lexique.

Kyle

On Jan 7, 2015, at 9:46 AM, mchlwgnr notifications@github.com wrote:

we're using @ with the old aligner for French, and it's working

On Wed, Jan 7, 2015 at 12:27 PM, Kyle Gorman notifications@github.com wrote:

Will test, then can do.

If you have some time, perhaps you could consult the HTK book (available freely as a PDF) for what the spec says about valid spelling phones. I think that any non-numeric, non-control, non-white space character is valid in initial position, and digits are also valid in final (or non-initial, not sure) position, but I've just been enabling what I know works.

K

Sent from my phone

On Jan 7, 2015, at 8:56 AM, mchlwgnr notifications@github.com wrote:

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub < https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69057517>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69065859 .

— Reply to this email directly or view it on GitHub.

mchlwgnr commented 9 years ago

Thanks, this worked, we updated the French models in the alignermodels file with the @ phone

On Wed, Jan 7, 2015 at 3:10 PM, Kyle Gorman notifications@github.com wrote:

Yea, let’s be as close as possible to the Lexique source.

Before I commit this fix, would it be possible for someone to send me or point me to data I could use to test? Specifically, I’d like to be able to test a dictionary that has “@“ and “^”, so a minimally processed Lexique would work.

On Jan 7, 2015, at 12:08 PM, mchlwgnr notifications@github.com wrote:

great, we'll try this (the only reason to stick to @ is how French is transcribed elsewhere--the current french model posted on alignermodels uses a different symbol just to make it work)

On Wed, Jan 7, 2015 at 1:20 PM, Kyle Gorman notifications@github.com wrote:

In YAML, you need to put any non-alphabetic character in quotes if you want them to be treated as strings (and you do, in this case).

YAML is a bit like Python in this regard, but unlike Python, it can also infer that alphabetic sequences are strings. To be safe, you could just put every phoneme in quotes.

Email me directly if you need a copy of my config file for French. It works with all the fancy characters from Lexique.

Kyle

On Jan 7, 2015, at 9:46 AM, mchlwgnr notifications@github.com wrote:

we're using @ with the old aligner for French, and it's working

On Wed, Jan 7, 2015 at 12:27 PM, Kyle Gorman < notifications@github.com> wrote:

Will test, then can do.

If you have some time, perhaps you could consult the HTK book (available freely as a PDF) for what the spec says about valid spelling phones. I think that any non-numeric, non-control, non-white space character is valid in initial position, and digits are also valid in final (or non-initial, not sure) position, but I've just been enabling what I know works.

K

Sent from my phone

On Jan 7, 2015, at 8:56 AM, mchlwgnr notifications@github.com wrote:

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub <

https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69057517>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub < https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69065859>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/37#issuecomment-69083841 .