Closed Alex-Kopylov closed 1 year ago
Hi @Alex-Kopylov, thank you for your pull request. I thought that min-max normalization would be a reasonable choice but it is certainly possible that there is a better normalization method which I have not tried yet.
Why have you closed your PR already? The failing unit tests should be easy to fix, as far as I can see in the CI pipeline. I'm going to reopen the PR now and check whether the softmax normalization is a better fit for the confidence values.
Thanks again for your contribution. I appreciate this a lot. :)
I closed it accidentally. Glad to hear that you're taking these changes into account. I'm going to play with different approaches more and will inform you if there will be something interesting.
What do you think about passing results to softmax function instead min-max normalization? I think it's more clear way. Because, for example, you can have a threshold to filter-out unidentified languages.
Is there are some pitfalls that aren't clear for me? I've implemented this by slightly changing your code. I've also rounded results.
It passed black and mypy, but not tests. It's throwing me error like:
INTERNALERROR> UnicodeEncodeError: 'charmap' codec can't encode characters in position 712-720: character maps to <undefined>