Closed katyamineeva closed 4 years ago
@katyamineeva thanks for the examples. This is something that I just need to explain better in the README (or in some real documentation, when I get to that stage). The issue is that disambiguate()
(or Document(input, disambiguate=True
), is not guaranteed to disambiguate all readings of a token. You can easily see which tokens are still ambiguous by doing print(doc)
, and you'll see that in the last Document, there are still 3 possible readings for слова. In other words, our constraint grammar still has a long way to go.
As for adding stress to ambiguous tokens, the default is doc.stressed(selection='safe')
, which abstains from adding stress to tokens that have ambiguous stress. You can use doc.stressed(selection='rand')
or doc.stressed(selection='all')
to make sure the in-vocabulary words are marked with stress, even if they are still ambiguous. For out-of-vocabulary words, you can add guess=True
, e.g. doc.stressed(selection='all', guess=True)
to allow an "intelligent" algorithm to guess where to put stress. (The algorithm is not actually very good, but it's better than nothing...?).
I just barely pushed a fix for the 'all'
method to the master
branch, so be sure to pull the latest if you want to use that.
>>> import udar
>>>
>>> doc1 = udar.Document('Мне недостаточно просто твоего честного слова.')
>>> doc2 = udar.Document('Красивые слова!')
>>> doc3 = udar.Document('Твои слова ничего не значат.')
>>>
>>> samples = [doc1, doc2, doc3]
>>>
>>> for doc in samples:
... doc.disambiguate()
... print(doc.stressed(selection='all'))
...
Мне́ недоста́точно про́сто твоего́ че́стно́го сло́ва.
Краси́вые слова́!
Твои́ сло́ва́ ничего́ не зна́чат.
Note that in doc3
, you get 'сло́ва́'
.
The fact that you had to ask a question is a result of the fact that these I have not documented these features well, so I'm going to leave this issue open as a reminder to improve the documentation. In the meantime, you can see docstrings for very basic documentation of many functions, e.g. help(udar.Token.stressed)
.
Thank you for the clarification!
Hi!
I think an example of ambiguity resolving might be helpful. For instance:
prints out
So, in the first and the second sentences an ambiguity was resolved correctly, but ambiguity remains in the third one. It's also not clear that after calling the
disambiguate
method some words may remain unstressed (and no warning message is printed out). At first, I tried your code with sentences where thedisambiguate
method doesn't change anything and thought that this is a mistake or code is incomplete.An thank you for you work!