Open goodmami opened 7 years ago
Should this be done by modifying the test() function in main.py?
Yeah I suppose. Here's the relevant code block in the test() function:
for dist in model.test(instances):
print(dir(dist))
print(dist.classes())
You could write a function to normalize the values (e.g. set the one with the highest confidence of a False value to 0, the highest confidence of a True value to 1, and scale everything else accordingly. Then replace the code block above with something like:
ranked_list = normalize_probabilities(model.test(instances))
if len(ranked_list) != 0:
top = ranked_list[0]
...
Reviewing the code, it looks like the model.test() function returns a Distribution object, which contains a dictionary of class to probability. Each Distribution object also has a best_class field, so if I'm not mistaken this issue might be solved by doing
for dist in model.test(instances):
print(dir(dist))
top = dist.best_class
I can put some normalization code into the Distribution class to make sure the probabilities are normalized.
Hmm, possibly. I didn't write models.py
, but I thought it returned a distribution for each language, and the classes were True and False, so if something had a high probability for False, best_class would return False for the language that distribution was made for.
I could be wrong though.
Currently (when the code works), it only returns the True/False prediction and its score (as model.Distribution objects). It may be the case that more than one, or none, of the languages are chosen as True. The score of the prediction should be used to rank the list of languages for a span, then use that one for the final prediction.