sebischair / SimpleNLG-DE

German version of SimpleNLG 4
https://wwwmatthes.in.tum.de/
Other
18 stars 1 forks source link

Problem getting the baseform right for inflected adjectives and verbs #5

Open GeorgeS2019 opened 8 months ago

GeorgeS2019 commented 8 months ago

NOTE: this is ported to csharp

Question 1=> Please suggest which step of creating inflected wordElement this could go wrong such that the baseform is defined wrong?

https://github.com/sebischair/SimpleNLG-DE/blob/5c831cb9722406c749bc00bdd867e4d694e4bb4a/src/test/java/simplenlgde/morphology/BasewordTest.java#L51

        string[] baseWords = { "gut", "gut",  "gut" , "gut" };
        string[] inflectedWords = { "gute", "gutes", "gutem",  "guten" };

        for (int i = 0; i < baseWords.Length; i++)
        {
            AdjPhraseSpec adj1 = nlgFactory.createAdjectivePhrase(baseWords[i]);
            AdjPhraseSpec adj2 = nlgFactory.createAdjectivePhrase(inflectedWords[i]);

            var baseformAdj1 = ((WordElement)adj1.getAdjective()).getBaseForm();
            var baseformAdj2 = ((WordElement)adj2.getAdjective()).getBaseForm();

baseformAdj2 remains "gute" instead of "gut"

https://github.com/sebischair/SimpleNLG-DE/blob/5c831cb9722406c749bc00bdd867e4d694e4bb4a/src/test/java/simplenlgde/morphology/BasewordTest.java#L64

        String[] baseword = { "sein", "sein", "sein", "sein", "gehen", "gehen"};
        String[] inflected = { "bin", "bist", "ist", "sind", "ging", "gingen" };

        for (int i = 0; i < baseword.Length; i++)
        {
            VPPhraseSpec vp1 = nlgFactory.createVerbPhrase(baseword[i]);
            VPPhraseSpec vp2 = nlgFactory.createVerbPhrase(inflected[i]);

            var baseformvp1 = ((WordElement)vp1.getVerb()).getBaseForm();
            var baseformvp2 = ((WordElement)vp2.getVerb()).getBaseForm();

baseformvp2 remains "bin" instead of "sein"

DaBr01 commented 8 months ago

Is it possible that you do not use the latest version of SimpleNLG-DE? There was in issue regarding the indexation of variants in previous versions that has been fixed (#2) in the latest version which could lead to this behaviour.

GeorgeS2019 commented 8 months ago

@DaBr01

I use SimpleNLG-DE v1.1.1 from Marven and the tests ported to c# against the ikvm SimpleNLG-DE version passed without problem but the performance of loading 42MB of MucLex.xml is a challenge using ikvm approach.

I use the codes from the existing master branch to port over to csharp.

I am still learning SimpleNLG-DE, and I have not yet completely tracked what was changed to create SimpleNLG-DE from the parent codes which is tailored for English

DaBr01 commented 8 months ago

But if the tests pass, then the base form is the same, I am not sure I understand what the problem is?

GeorgeS2019 commented 8 months ago

The first approach: ikvm SimpleNLG-DE version involves SimpleNLG-DE.jar 1.1.1 from Marven without porting java to c#.

The performance is the issue.

The second approach looks into the existing java codes in the master branch and port that to c#. This approach promises far better performance of loading MucLex.xml than the first approach.

However, I need to track how SimpleNLG-DE java codes create the inflected WordElement (with the right baseform) used in the tests cited above.

I fail to port that as the second approach csharp ported version fails to provide the right baseform for the inflected words used in the tests above.

DaBr01 commented 8 months ago

If you follow the issue I linked above you will find the exact commit where this was introduced / fixed in SimpleNLG-DE (commit d77058a)

GeorgeS2019 commented 8 months ago

@DaBr01 thanks, this is a good start to learn SimpleNLG-DE

GeorgeS2019 commented 8 months ago

@DaBr01 Good morning, the above tests passed now.

Question 1: which part of the codes deals with capitalization of the noun?

    [Fact]
    public void CreateAMoreComplexSentence1()
    {
        SPhraseSpec sentence = nlgFactory.createClause();

        NPPhraseSpec subject = nlgFactory.createNounPhrase("der hund");
        VPPhraseSpec verb = nlgFactory.createVerbPhrase("jagen");
        NPPhraseSpec object1 = nlgFactory.createNounPhrase("george");

        sentence.setSubject(subject);
        sentence.setVerb(verb);
        sentence.setObject(object1);

        string output = realiser.realiseSentence(sentence);

        Assert.Equal("Der Hund jagt George.", output);

    }

I managed to have "jagt" from "jagen". However, I could not get capitalization of george => George and hund => Hund

Appreciate your help.

DaBr01 commented 8 months ago

Uff I really don't know that by heart :) A search in the repo might be helpful: https://github.com/search?q=repo%3Asebischair%2FSimpleNLG-DE+capital&type=code

Looks like there is a function capitaliseFirstLetter in the OrthographyProcessor.