Closed titipata closed 7 years ago
labelEncoder has an issue with different version of sklearn. It is best to get rid of it and create manual dictionary for mapping like {'a': 0, 'b': 1, etc.} instead of relying on sklearn's function. I will take care of this (+ hopefully improving network architecture) and rerun all training in a few days.
@rkcosmos, taken care of in this PR :). I put dictionary as you mentioned in this PR instead of LabelEncoder
. I didn't remove object.pk
, however, you can remove it later on.
I think it's ready to be reviewed, maybe merge later. The current workflow for training model looks like the following:
import deepcut
# preprocess
best_path = ''
best_processed_path = 'cleaned_data/'
deepcut.train.generate_best_dataset(best_path, output_path=best_processed_path)
# training
x_train_char, x_train_type, y_train = deepcut.train.prepare_feature(best_processed_path, option='train')
model = deepcut.model.get_convo_nn2()
model.fit([x_train_char, x_train_type], y_train, epochs=10, batch_size=256, verbose=1)
model.fit([x_train_char, x_train_type], y_train, epochs=3, batch_size=512, verbose=1)
model.fit([x_train_char, x_train_type], y_train, epochs=3, batch_size=2048, verbose=1)
model.fit([x_train_char, x_train_type], y_train, epochs=3, batch_size=4096, verbose=1)
model.fit([x_train_char, x_train_type], y_train, epochs=3, batch_size=8192, verbose=1)
# evaluating
f1score, precision, recall = deepcut.train.evaluate(best_processed_path, model)
[Work in Progress] @rkcosmos, I'm trying to create reproducible script for training the model (now in notebook in
Research-Notebook
folder).With this PR, it takes BEST path and save CSV files to the given folder in
output_path
. Then usetrain_model
in order to train model from given cleaned BEST path.I checked the output
char_le
, it is slightly different compared to currentchar_le
from pickle file.It would be great if you can suggest how you want it to be. Happy to chat more later on!