Open shaypal5 opened 4 years ago
Hi, is this implemented? I am having issues with the Multilabel Case. The Transformation with MultiLabelBinarizer leads to the error: ValueError: FastTextClassifier methods must get a one-dimensional numpy array as the y parameter.
What can I do?
Thank you very much.
Or do you have any other recommendation how to Cross Validate the Results of fastText supervised training (MultiLabel)? I am looking for a solution for weeks now... Any help is very much appreciated.
Kind Regards, Eva
Hey Eva!
I'll try to help you as best as I can. However, I don't have the time to implement it right now. I can guide you through contributing the code yourself. :)
First, as the issue is open, it shouldn't come as a surprise that this isn't implemented.
As as you can see in this example file from the FastText tutorial for text classification, this is the format for multilabel problems:
__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?
So, very much like the multiclass format, just with multiple __label__
tags at the start of each line.
Two main areas of code in skift
require adaptation for multilabel problems to be supported:
FtClassifierABC
class must be adapted to accept y
arguments that are also of shape (n_samples, n_outputs)
, as in sklearn. This includes such methods as _validate_y
and fit
.y: array-like of shape (n_samples,) or (n_samples, n_outputs)
util.dump_xy_to_fasttext_format()
function must be adapted to properly dump multilabel targets, in the format I linked to above.Hi Shaypal,
thanks a lot for replying!
I already got the correct format in my data. But unfortunately I dont think I am able to implement the feature by myself.
Do you by any chance have some experience perfoming a cross validation on the outcome of fasttext supervised training? Because that is the reason I was looking into this wrapper class. I couldnt find a lot of up to date information regarding validation of fasttext.
Cheers Eva
Add support to providing multi-label labels in a scikit-learn-compliant format, utilizing (under the hood) fasttext's support for multi-label scenarios.