versotym / rhymetagger

A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry
29 stars 4 forks source link

Possible faulty prediction of the `de` model #1

Closed zouharvi closed 2 years ago

zouharvi commented 2 years ago

According to my vauge understanding of rhyme, the following poem should have the rhyme ABABC. However, the model does not detect it. Is this an error on my side (or my installation) or did this just got mispredicted by the model? Are there any other models that could make this work? Or perhaps a setting that would increase rhyme sensitivity?

import rhymetagger

poem = """
Zwei Straßen gingen ab im gelben Wald,
Und leider konnte ich nicht beide reisen,
Da ich nur einer war; ich stand noch lang
Und sah noch nach, so weit es ging, der einen
Bis sie im Unterholz verschwand;
""".strip()

rt = rhymetagger.RhymeTagger()
rt.load_model(model="de")
print(rt.tag(poem.split("\n"), output_format=3))

Output:

====================================
Model loaded with following settings:
====================================
  frequency_min: 3
           lang: de
       max_iter: 20
          ngram: 3
   ngram_length: 3
   prob_ipa_min: 0.9
 prob_ngram_min: 0.9
     same_words: False
   stanza_limit: True
         stress: True
       syll_max: 2
    t_score_min: 3.078
   vowel_length: True
         window: 5
====================================
[None, None, None, None, None]
versotym commented 2 years ago

According to my vague understanding of German, I'd say these are kind of "imperfect rhymes". The model was trained with data mainly from 17C to 19C where I expect rhyming to be way more constrained. Relaxing the prob_ipa_min and prob_ngram_min parameters may do the trick.

zouharvi commented 2 years ago

Relaxing the prob_ipa_min and prob_ngram_min parameters may do the trick.

Thank you, that's exactly what I was looking for. I got some (partially correct) results ABABA only once I went as low as prob_ngram_min=0.001, prob_ipa_min=0.001 which is very iffy. I'll try to see whether I can find some more data or some other method.