openhab / org.openhab.ui.habot

A chatbot for openHAB using machine-learning natural language processing from OpenNLP
Eclipse Public License 1.0
65 stars 31 forks source link

German translation #3

Closed stritti closed 6 years ago

stritti commented 6 years ago

I started initial German translations. It is still in progress, but wanted to be sure to be on the correct way :)

Some proposals:

ghys commented 6 years ago

Hi @stritti! That's great!

The format seems fine, the properties files (for answers) is simply a list of sentences separated by |. The program will choose one at random as an answer.

Just so we're clear, the text files in train are to be considered as training data for the machine learning (document categorizer & token name extractor), rather than simply translations, which means:

  1. they don't have to be sentence-by-sentence literal translations of what's found in the English files (or another language), maybe something different will work better in German - the goal is for the machine learning to do the job correctly when presented with some sentence;
  2. the more data the better ;) - just write down what comes to mind and don't be afraid to vary the entities inside <START:tag> <END> with what you commonly find in a smart home!

Also there needs to be a file for each intent or you'll get an exception - English is complete, French currently not. I have a test class (TrainerTest.java) to test the interpreter quickly, it doesn't have proper assertions, only dumps the output to the console, but it's useful to quickly test how well the training fares - sometimes it makes mistakes, it all depends on the training data.

Good idea about the fallback measures for invalid languages - FYI, it will use your browser settings only if you didn't a language in Paper UI (Regional Settings), the latter will be the priority. One more thing, you'll need to add the German locale here - it'll get better at some point :) https://github.com/ghys/habot/blob/e5961830b0ab4d538e3f3db0ba601be691116324/src/main/java/org/openhab/ui/habot/nlp/internal/OpenNLPInterpreter.java#L40-L41

Really appreciate you doing this! Many thanks 👍

ghys commented 6 years ago

Sorry I'm messing things up :( (your work isn't gone :))

stritti commented 6 years ago

No problem :)

ghys commented 6 years ago

I wanted to commit a fix to your branch and ended up merging your changes by accident :) Could you pull the latest changes and maybe try to open a new PR? (sorry again) Try to create a new branch, not using your master branch, maybe that's what went wrong ;) On the plus side, I made the German tests work :)

ghys commented 6 years ago

Doesn't work so bad 😁

habot-de

I encourage you to keep on optimizing the training data 😄

stritti commented 6 years ago

Cool. I will try to continue in the next days

stritti commented 6 years ago

I've updated trainings and tests. I think it could be merged now "officially" :)

ghys commented 6 years ago

Great! That's a very good start! I might add some more skills over time which would require more training data, I'll take the liberty to ping you then ;) Anyways, I think now it will continue to work without crashing; if there is training data missing for new skills, they simply won't be taken into account but the existing ones will continue to work.

I think it could be merged now "officially" :)

Nothing official yet :P Nonetheless, I would like to enforce the same contributing rules as openHAB (even though it's not mentioned anywhere yet) to make sure there's no problem if/when this thing makes it into the openHAB distro (I think @kaikreuzer would agree ;) - which means I'd like to get a sign-off line from you to include in the squashed commit; basically if you're OK with http://developercertificate.org/ just state somewhere:

Signed-off-by: Joe Smith <joe.smith@email.com> (github: github_handle)

Could be in a comment here, but the best would be in a final commit, I don't know if you've noticed but the indentation is kind of messed up in the test class, and I added some helper functions in the test class to be able to check interpretations with a one-liner like:

assertIsActivate("turn on the lights in the kitchen", "lights", "kitchen");
checkInterpretation(Skills.GET_HISTORY_HOURLY,
                 "show me a graph of the temperature in the living room for the last 3 hours", "temperature",
                 "living room");

So it would be nice to have all this tidied up.

Thanks ;)

ghys commented 6 years ago

@stritti sorry to keep you working 😁 I have now added a new skill which I think is an important one, this one is for setting values like thermostats, percentages/dimmers and colors. I'm hoping you could take care of the German support for it when you have the time, before I merge this PR :) Here's the English training data: https://github.com/ghys/habot/blob/d5ae7d7e3c6ab9e7278e9f3ec16a579affa6b5f4/src/main/resources/train/en/set-value.txt#L1-L23 (note the distinction between <START:value> and <START:color> entities)

And there are a bunch of new answers too: https://github.com/ghys/habot/blob/d5ae7d7e3c6ab9e7278e9f3ec16a579affa6b5f4/src/main/resources/answers.properties#L58-L84 Don't worry this is the only one I really wanted to add before merging, I won't add new ones for a while now... Thanks again.

stritti commented 6 years ago

I thought an 'help' intent could be a nice addition, too 😅 I will try to update the PR in the next days.

What do you think about splitting in Test-Class per language? I think it will be better maintainable for upcoming languages:

ghys commented 6 years ago

I thought an 'help' intent could be a nice addition, too

Sure, but a Help section is already sort of planned. The problem is, the more intents, the longer it will take to perform the initial training (it is an issue on raspberry pis and so on), and maybe the probability of mistakes will increase.

What do you think about splitting in Test-Class per language?

Yes I had this exact same idea but didn't get around to it yet ;)

stritti commented 6 years ago

Okay, I split the Test cases now.

Sorry, but I am not fan to extend tests by special methods. I think it is more readable and clear to check objects directly in the test method. I left the assertations in German version in "raw".

ghys commented 6 years ago

Great!

Sorry, but I am not fan to extend tests by special methods. I think it is more readable and clear to check objects directly in the test method. I left the assertations in German version in "raw".

No worries, your call; I don't mind :) It was a way of making them more concise, and try to make it easier to add more.

stritti commented 6 years ago

Please review. I think, it could be merged.

Signed-off-by: Stephan Strittmatter <stephan.strittmatter@gmail.com> (github: stritti)

ghys commented 6 years ago

You know what, it's looking real good! So once more, a big thank you!

(and if you happen to extend or optimize the training data after some real-life usage, please don't hesitate to open more PRs 😉 - I'll merge them without question)