ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.
BSD 3-Clause "New" or "Revised" License
410 stars 104 forks source link

Series IDs get somehow lowercased during configuration loading #522

Open jlibovicky opened 7 years ago

jlibovicky commented 7 years ago

I found no place in our code that may cause it. I guess it is the INI parser which pre-processes the field names.

tomasmcz commented 7 years ago

Do you mean that s_UP=... is interpreted as s_up=...?

jindrahelcl commented 7 years ago

From Python's ConfigParser documentation:

As we can see above, the API is pretty straightforward. The only bit of magic involves the DEFAULT section which provides default values for all other sections [1]. Note also that keys in sections are case-insensitive and stored in lowercase [1].

jindrahelcl commented 7 years ago

https://docs.python.org/3/library/configparser.html#quick-start

jindrahelcl commented 7 years ago

I suggest to document this and leave it as it is.

jindrahelcl commented 7 years ago

If we decide we want to allow case-sensitive keys, there is an option: https://docs.python.org/3/library/configparser.html#configparser.optionxform

tomasmcz commented 7 years ago

Or we could abandon (or at least deprecate) the **kwargs hack in load_dataset_from_files and use something like series_in=[("UpperCase", "path/to/file"), ...] and series_out=[("UpperCaseOut", "path/to/file"), ...].

We could also extend the ini syntax to enable linebreaks inside brackets, so this can be written in an elegant way.

jlibovicky commented 7 years ago

If you are really willing to extend the syntax in this way, I would prefer getting rid the **kwargs like this, it would come in handy in other situations like listing postprocessors. Otherwise, I would stick to the current state, the in-line lists would only make the config clumsy.

jindrahelcl commented 6 years ago

Solution to this issue: Fix error messages to contain the set of defined series ids:

Error: unknown series ID: "SoUrCe". Possible IDs are: ["source", "target"]