ibrahimsharaf / doc2vec

:notebook: Long(er) text representation and classification using Doc2Vec embeddings
MIT License
106 stars 43 forks source link

Add CLI to the model #17

Open ibrahimsharaf opened 5 years ago

ibrahimsharaf commented 5 years ago

Add CLI support for the following commands:

Depends on #14

dheerajgattupalli commented 5 years ago

Hi, I am interested in helping out with this issue ... I already made some progress and made the CLI for training, testing and single sentence prediction with python fire module ... I just wanted to clarify about the train test split requirement exactly ... you are expecting it to be saving the complete data after the train test split into files and loading them back for repeatability because that's already being taken care of by random state parameter ... so if you can add some info to the exact requirement for train test split separately would be helpful...

Thank You.

ibrahimsharaf commented 5 years ago

Hi @dheerajgattupalli, thanks for your collaboration, there's no need to save the train/test data into separate files on disk.

dheerajgattupalli commented 5 years ago

So what should that command do?

ibrahimsharaf commented 5 years ago

It would take a dataset path, read it into pandas dataframe, then split it to train/test using sklearn train_test_split method, use the training data to train doc2vec then classifier, use the testing data to test the trained classifier, report back the accuracy metrics.