Closed qtdaniel closed 4 years ago
On Tue, 23 Jun 2020, qtdaniel wrote:
I would like to use the sentence break filters, i.e. lists of common abbreviations, that adjust the behaviour of the sentence tokeniser as documented here: http://userguide.icu-project.org/boundaryanalysis#TOC-Sentence-Break-Filter
I was expecting to be able to achieve this by simply changing my local string from "en" to "en@ss=standard" but this causes the error
'icu.BreakIterator' object has no attribute 'getRuleStatus'
whengetRuleStatus
is called on the resulting break iterator.I've tried this via
icu.BreakIterator.createSentenceInstance
and viaicu.RuleBasedBreakIterator.createSentenceInstance
but both exhibit the same error.
Thank you for the detailed reproduction steps. I was able to reproduce the issue as described. At first glance, it looks like:
bi = icu.BreakIterator.createSentenceInstance(icu.Locale("en")) bi <RuleBasedBreakIterator: 0x7ff74dc0f7b0> class RuleBasedBreakIterator has method getRuleStatus()
bi = icu.RuleBasedBreakIterator.createSentenceInstance(icu.Locale("en@ss=standard")) bi <BreakIterator: 0x7ff74de0cb00> class BreakIterator does not have this method
Why is one call creating a RuleBasedBreakIterator and the other just a BreakIterator is a question for the ICU users list. I don't know the answer myself. The getRuleStatus() is defined on both C++ classes but is missing on the PyICU BreakIterator wrapper. It looks like it appeared on BreakIterator in ICU 52 only (while it's been on RuleBasedBreakIterator since ICU 2.2). This explains the oversight. I now added the missing wrapper in HEAD.
Thank you for the report !
Andi..
Heres is a minimal reproduction:
import icu bi = icu.BreakIterator.createSentenceInstance(icu.Locale("en")) print(bi.getRuleStatus()) bi = icu.RuleBasedBreakIterator.createSentenceInstance(icu.Locale("en")) print(bi.getRuleStatus()) bi = icu.BreakIterator.createSentenceInstance(icu.Locale("en@ss=standard")) try: print(bi.getRuleStatus()) except Exception as exception: print( "Failed to get rule status when using en@ss=standard locale with BreakIterator", exception ) bi = icu.RuleBasedBreakIterator.createSentenceInstance(icu.Locale("en@ss=standard")) try: print(bi.getRuleStatus()) except Exception as exception: print( "Failed to get rule status when using en@ss=standard locale with" " RuleBasedBreakIterator", exception )
When run, that code emits
0 0 Failed to get rule status when using en@ss=standard locale with BreakIterator 'icu.BreakIterator' object has no attribute 'getRuleStatus' Failed to get rule status when using en@ss=standard locale with RuleBasedBreakIterator 'icu.BreakIterator' object has no attribute 'getRuleStatus'
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/ovalhub/pyicu/issues/133
Thanks.
I've tried building pyicu from source so I can test this but the build failed. I'm afraid I won't be able to spend the time needed to resolve this right now since I won't be able to make use of the change until the package is updated in conda-forge anyway.
The change sounds like it should resolve the problem so I think this issue can be closed now.
I would like to use the sentence break filters, i.e. lists of common abbreviations, that adjust the behaviour of the sentence tokeniser as documented here: http://userguide.icu-project.org/boundaryanalysis#TOC-Sentence-Break-Filter
I was expecting to be able to achieve this by simply changing my locale string from "en" to "en@ss=standard" but this causes the error
'icu.BreakIterator' object has no attribute 'getRuleStatus'
whengetRuleStatus
is called on the resulting break iterator.I've tried this via
icu.BreakIterator.createSentenceInstance
and viaicu.RuleBasedBreakIterator.createSentenceInstance
but both exhibit the same error.Here's is a minimal reproduction:
When run, that code emits
I am using icu and pyicu via conda-forge: