unytics / bigfunctions

Supercharge BigQuery with BigFunctions
https://unytics.io/bigfunctions/
MIT License
595 stars 55 forks source link

[new]: `detect_lang(text)` #168

Open pocman opened 1 week ago

pocman commented 1 week ago

Check the idea has not already been suggested

Edit the title above with self-explanatory function name and argument names

BigFunction Description as it would appear in the documentation

Detect the language of the text using langdetect. langdetect supports 55 languages out of the box (ISO 639-1 codes):

af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he, hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl, pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw

Examples of (arguments, expected output) as they would appear in the documentation

text=War doesn't show who's right, just who's left. --> en text=Ein, zwei, drei, vier --> de

unytics commented 1 week ago

Excellent idea @pocman

What do you think of using a javascript library for this function such as franc to get faster results (javascript udf do not need to have cloud run resources deployed)?