src-d / wmd-relax

Calculates Word Mover's Distance Insanely Fast
Other
460 stars 79 forks source link

Classification use-case #16

Open wbecker opened 7 years ago

wbecker commented 7 years ago

This project doesn't currently allow for the predicting the type of an input, as there is no sense of knowing to what type an input value maps.

Normally when using a classifier, there is a two stage process. 1 - fit(X, y), using training input and output data 2 - predict(X), using unknown data, and returning the estimated

It would be good if this project presented a similar interface.

I would suggest creating a class, wmd_classifier, which implements these two models.

fit, which would:

predict, which would:

wbecker commented 7 years ago

I'd be happy to contribute something like this!

vmarkovtsev commented 7 years ago

@wbecker This is :+1: sklearn-like interface would be really useful. Feel free to PR.

My only suggestion is to abstract the way a document is transformed into nBOW. E.g. provide a function in __init__ and let the documents be "objects", with nice defaults for spacy/strings.

And let's name it WmdClassifier. I have just stated the contribution guidelines in https://github.com/src-d/wmd-relax/wiki/Contributions