Open jlqibm opened 2 weeks ago
Hi, @jlqibm.
Temporary bypass solution:
import numpy as np
from ftlangdetect.detect import get_or_load_model
def custom_predict(self, text, k=1, threshold=0.0, on_unicode_error="strict"):
"""
Given a string, get a list of labels and a list of
corresponding probabilities. k controls the number
of returned labels. A choice of 5, will return the 5
most probable labels. By default this returns only
the most likely label and probability. threshold filters
the returned labels by a threshold on probability. A
choice of 0.5 will return labels with at least 0.5
probability. k and threshold will be applied together to
determine the returned labels.
This function assumes to be given
a single line of text. We split words on whitespace (space,
newline, tab, vertical tab) and the control characters carriage
return, formfeed and the null character.
If the model is not supervised, this function will throw a ValueError.
If given a list of strings, it will return a list of results as usually
received for a single line of text.
"""
def check(entry):
if entry.find("\n") != -1:
raise ValueError("predict processes one line at a time (remove '\\n')")
entry += "\n"
return entry
if type(text) == list:
text = [check(entry) for entry in text]
all_labels, all_probs = self.f.multilinePredict(
text, k, threshold, on_unicode_error
)
return all_labels, all_probs
else:
text = check(text)
predictions = self.f.predict(text, k, threshold, on_unicode_error)
if predictions:
probs, labels = zip(*predictions)
else:
probs, labels = ([], ())
return labels, np.asarray(probs)
def custom_detect(text: str, low_memory=False) -> dict[str, str | float]:
model = get_or_load_model(low_memory)
model.__class__.predict = custom_predict
labels, scores = model.predict(text)
label = labels[0].replace("__label__", '')
score = min(float(scores[0]), 1.0)
return {
"lang": label,
"score": score,
}
With numpy 1.26.4 and python 3.11, things work fine. With numpy 2.1.3 and python 3.11, I get a crash: Successfully installed numpy-2.1.3 (hf) [jlquinn@cccxc520 fms-dgt-internal]$ python Python 3.11.0 (main, Mar 1 2023, 18:26:19) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.