Closed niftylettuce closed 4 years ago
Perhaps using Wikimedia word dictionary would be a better dataset for accuracy.
Perhaps an error should be thrown if the string length doesn't reach a minimum number of characters, perhaps 200? Not sure if you've figured out what that magic number is.
For insight into my comment on the Wikimedia dataset, there are basically downloadable tarballs of entire dictionaries of every language, which also includes topics/people/etc.
There have been many issues about this, see my responses to most closed issues. It’s in the readme: https://github.com/wooorm/franc#whats-not-so-cool-about-franc
Perhaps an error should be thrown if the string length doesn't reach a minimum number of characters
See the example in the readme: pass minLength: 200
.
Not sure if you've figured out what that magic number is.
There is none: this is data model based, there is no perfect answer. There is just “likeliness”.
Perhaps using Wikimedia word dictionary would be a better dataset for accuracy.
There is no bigger copyright-free dataset than the universal declaration of human rights. Franc focusses on supporting many languages. Checkout CLD-based projects if you care less about many languages.
Awesome, it wasn't obvious at first that "und" meant undefined/not found. Might be useful to add this to the README and make more of an example.
How about this?
var franc = require('franc')
franc('Alle menslike wesens word vry') // => 'afr'
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট') // => 'ben'
franc('Alle menneske er fødde til fridom') // => 'nno'
-franc('') // => 'und'
-franc('the') // => 'und'
-
-/* You can change what’s too short (default: 10): */
-franc('the', {minLength: 3}) // => 'sco'
+
+// You can change what’s too short (default: 10):
+franc('the') // => 'und' (`und` is a language code which stands for undetermined)
+franc('the', {minLength: 3}) // => 'sco'
👍 👍 👍
Sweet, fixed!
These are all completely incorrect accuracies.