prosodylab / Prosodylab-Aligner

Python interface for forced audio alignment using HTK and SoX
http://prosodylab.org/tools/aligner/
MIT License
331 stars 77 forks source link

Implemented multiple-dictionary support #78

Closed iskunk closed 4 years ago

iskunk commented 4 years ago

First draft of multi-dictionary support. Includes logic to filter out blank lines and # comments (in addition to ; comments, which were already implemented).

Usage example:

python3 -m aligner -d eng.dict -d my.dict -a /tmp/files/
kylebgorman commented 4 years ago

On Fri, Jan 17, 2020 at 7:45 PM Daniel Richard G. notifications@github.com wrote:

@iskunk commented on this pull request.

In aligner/prondict.py https://github.com/prosodylab/Prosodylab-Aligner/pull/78#discussion_r368190457 :

@@ -42,25 +42,31 @@ class PronDict(object): @staticmethod def pronify(source): for (i, line) in enumerate(source, 1):

  • if line.startswith(";"):
  • line = line.strip()
  • if not line or line.startswith(";") or line.startswith("#"):

The # comment marker was part of the original proposal :-)

I can split this off. Given that HDMan doesn't appear to support ; comments, shall I also remove that in the new PR?

Please do.

You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/pull/78?email_source=notifications&email_token=AABG4ONYZDIHM3GV7ODQA5DQ6JGKHA5CNFSM4KINH7HKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCSHOJSY#discussion_r368190457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4ONLHO36MIUHNPIR7G3Q6JGKHANCNFSM4KINH7HA .

kylebgorman commented 4 years ago

On Fri, Jan 17, 2020 at 7:37 PM Daniel Richard G. notifications@github.com wrote:

@iskunk commented on this pull request.

In aligner/main.py https://github.com/prosodylab/Prosodylab-Aligner/pull/78#discussion_r368189372 :

@@ -51,8 +51,8 @@ description="Prosodylab-Aligner") argparser.add_argument("-c", "--configuration", help="config file") -argparser.add_argument("-d", "--dictionary", default=DICTIONARY,

  • help="dictionary file (default: {})".format(DICTIONARY)) +argparser.add_argument("-d", "--dictionary", metavar="DICT", dest="dictionaries", action="append",

The reason why I avoided this approach is because the -d DICT1 DICT2 command-line syntax is quite unusual. This is meant to be "the -d option is taking two arguments, DICT1 and DICT2," but the more common interpretation is "the -d option is taking a single DICT1 argument, and DICT2 is being passed as a non-option argument." I don't know of any common command-line program that uses the former interpretation

You’re not wrong that it’s not very common, I am just vaguely horrified how much hackery is necessary to force this to become a list.

I’m okay with your proposed flag design after discussion (thanks for that).

I do think that the YAML file format ought to revert to passing a list though, probably?

(If you insist on this, I can do it, but my desire here is to have the

command-line interface conform to established conventions. I've already been working on a later PR that reorganizes the options to be less confusing.)

The hackish bit is actually due to a different issue, specifically this https://bugs.python.org/issue16399 Python foible. (Without that, there would be no way to get eng.dict out of the list of dictionaries being passed in.)

Not sure what you mean by "global state"; the metavar and dest args are just overriding the defaults to something more sensible...

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/pull/78?email_source=notifications&email_token=AABG4OODQU3YB4BZJIHXB23Q6JFM5A5CNFSM4KINH7HKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCSHN7FQ#discussion_r368189372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OMC6XLF6OYMVUQLDTTQ6JFM5ANCNFSM4KINH7HA .

iskunk commented 4 years ago

Updated the PR; let me know how this looks.

You’re not wrong that it’s not very common, I am just vaguely horrified how much hackery is necessary to force this to become a list.

Oh, it's already a list if an argument is specified, and I could even do default=[] to make it an empty list if not (otherwise it's just None). The hack is needed in any event to get around the lack of proper default-override behavior.

I do think that the YAML file format ought to revert to passing a list though, probably?

Where? From looking at resolve_opts(), it doesn't seem like a YAML file is even allowed to specify a dictionary file (it gets overridden immediately by args.dictionary).

kylebgorman commented 4 years ago

I do think that the YAML file format ought to revert to passing a list though, probably?

Where? From looking at resolve_opts(), it doesn't seem like a YAML file is even allowed to specify a dictionary file (it gets overridden immediately by args.dictionary).

Oh boy, that's a bug (or maybe feature request) for later, no need to worry about then.