Closed blackmad closed 1 year ago
Yes It's a normal problem, to avoid repeating myself I created a FAQ and answered here
I have few ideas that could help for shorter string accuracy but nothing magical
thanks for the link, apologies we hadn't found that already.
On Thu, Nov 10, 2022 at 12:15 AM Kevin Destrem @.***> wrote:
Closed #19 https://github.com/komodojp/tinyld/issues/19 as completed.
— Reply to this email directly, view it on GitHub https://github.com/komodojp/tinyld/issues/19#event-7779696870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMZMBHUHAZRVPBK5SSGOTWHSAIHANCNFSM6AAAAAAR3ZW75U . You are receiving this because you authored the thread.Message ID: @.***>
-- David Blackman creative technologist & wandering help me find my purpose http://purpose.blackmad.com
Hi! We've been playing with tinyld for identifying the language of short search queries and have been a little surprised by strings that seem pretty clearly english to us being very hard for it to give us high accuracy signals. Is it a known limitation that tinyld struggles with short text?
"search sprint 1" gives us Merge Results [ { lang: 'ga', accuracy: 0.08333333333333333 }, { lang: 'et', accuracy: 0.044066666666666664 }, { lang: 'ro', accuracy: 0.03285 }, { lang: 'es', accuracy: 0.030449999999999994 }, { lang: 'en', accuracy: 0.014425000000000002 } ]
with only=en, we get an accuracy of 0.117 for english on that string
new hire onboarding, only=en -> 0.058 codebase modularization, only=en -> 0