google / cld3

Apache License 2.0
776 stars 109 forks source link

Relation among CLD2 Score and CLD3 Accuracy #24

Open loretoparisi opened 5 years ago

loretoparisi commented 5 years ago

In my project I have to port the language detector from CLD2 to CLD3. The CLD2 has a concept of Score, and Percentage of some language in the text. Internally the Score is calculated from a probability (not exposed in my understanding) in some way (my assumption was from the field textBytes that represents the size in bytes of the text, the accuracy and distribution of each label in the text), something like Acc=1-textBytes/Score In CLD2 the function that normalizes these scores is

normalized_score3[2] = GetNormalizedScore(language3[2],
                                                  ULScript_Common,
                                                  bytecount3,
                                                  doc_tote->Score(2));

That said, since I need to upgrade to CLD3, I have at some point to convert from CLD2 Score to CLD3 accuracy value. Any hint how to achieve that?

Here for reference: https://github.com/dachev/node-cld/issues/52

loretoparisi commented 5 years ago

[UPDATE] Here is the SF question https://stackoverflow.com/questions/55186435/converting-language-detection-score-of-cld2-to-cld3-accuracy

loretoparisi commented 4 years ago

@jasonriesa any help on this? Thank you