pommedeterresautee / fastrtext

R wrapper for fastText
https://pommedeterresautee.github.io/fastrtext/
Other
101 stars 15 forks source link

Sorting multiclass prediction output by label name #27

Closed prokopyev closed 5 years ago

prokopyev commented 6 years ago

Working on a multi-class classification. Looking for a way to flatten the list with output predictions. Currently I have outputs in this format:

$document1_text __label__B __label__C __label__A 0.9129 0.0441 0.0166 $document2_text __label__A __label__C __label__B 0.0741 0.0736 0.0730

Given a command like t(as.data.frame(predictions)) I am able to get to the following flat format:

id____________ __label__B __label__C __label__A document1_text 0.9129 0.0441 0.0166 document2_text 0.0741 0.0736 0.0730 The issue is that due to differences in order of labels, observation document2_text gets wrong values in each of the columns. I hope that authoring this package you might have already come across this situation, even though it is more about general list manipulations in R. Unfortunately your code on # you can get flat list of results when you are retrieving only one label per observation print(head(predict(model, sentences = test_to_write, simplify = TRUE))) does not help given my current design.

I think this would be solved easily if we could order prediction outputs by class name after X most likely classes are provided, as in __label__A __label__B __label__C. Can you recommend a way to do this?

pommedeterresautee commented 6 years ago

The easiest way I see is to sort the list per slot based on numeric vector names.

my_predictions <- lapply(my_predictions, function(vec) vec[order(names(vec)]])
# and then as.data.frame

I have not tested the code, so may be there is a syntax error but you get the idea