Open GoogleCodeExporter opened 9 years ago
Hi !!!
I'm facing the same issue and the usage of upper-case characters for the full
text seems to be the cause of the problem.
A simple workaround consists in converting the full text to lower case.
We test it on about 150 use-cases (that return bad result on upper-case text)
and it works for all of them.
From my understanding, the corpus used to create the profile only contains
upper-case characters when a new sentence begins. That means that the profiles
define rarely upper-case n-grams with more than one character (and when it is
the case, the weight is very low).
The profiles could be regenerated using the raw content, the full content
converted to lower case and the full content converted to upper case and thus
cover all the use-cases.
An another idea could to make the detection case insensitive by regenerating
the profiles using the full content converted to lower case and converting
automatically the submitted text to lower case.
Regards
Jerome
Original comment by gro...@gmail.com
on 6 Feb 2014 at 3:25
This issue should be renamed by the way.
Original comment by gro...@gmail.com
on 6 Feb 2014 at 3:30
Original issue reported on code.google.com by
thk.k...@gmail.com
on 15 Jun 2012 at 10:55