Open lrocholl opened 2 years ago
@lrocholl thank you for asking this question.
I believe the issue steams from the way you are providing the multiple labels.
When yo uspecify type to be set
in Ludwig, the current expectation is that a string is provided for each row with the set of classes expresses as a whitespace separated list.
So, instead of ['Short Film', 'Documentary']
it shoudl look like 'Short_Film Documentary'
.
Try to do it this way and let me know if it works.
This anyway also suggests that we may want to introduce some flexibility in the way sets are provided, maybe we should accept both lists and sets of strings other than whitespace-separated strings.
I'm running some experiments using multi-label classification of movies in one or more genres based on their plot.
My model definition is the following: model_definition = { 'input_features':[ {'name':'plot', 'type':'text', 'level': 'word', 'encoder': 'parallel_cnn'} ], 'output_features': [ {'name': 'genre_new', 'type': 'set'} ] }
My training dataset looks like this:![image](https://user-images.githubusercontent.com/8400739/130172719-0fa54d12-e315-4509-b613-2ceb60d3fb20.png)
However, my predictions are not looking really good:![image](https://user-images.githubusercontent.com/8400739/130172954-6e2ab766-4e2a-4669-864c-c4b8e44dfe41.png)
Is there anything I am missing here? I understand that the format of the set column might be influencing the results but not sure if this is the right approach.
I would appreciate your comments. Thanks.