Open ibrahimsharaf opened 5 years ago
Hi, I am interested in helping out with this issue ... I already made some progress and made the CLI for training, testing and single sentence prediction with python fire module ... I just wanted to clarify about the train test split requirement exactly ... you are expecting it to be saving the complete data after the train test split into files and loading them back for repeatability because that's already being taken care of by random state parameter ... so if you can add some info to the exact requirement for train test split separately would be helpful...
Thank You.
Hi @dheerajgattupalli, thanks for your collaboration, there's no need to save the train/test data into separate files on disk.
So what should that command do?
It would take a dataset path, read it into pandas
dataframe, then split it to train/test using sklearn
train_test_split
method, use the training data to train doc2vec
then classifier, use the testing data to test the trained classifier, report back the accuracy metrics.
Add CLI support for the following commands:
Depends on #14