rkcosmos / deepcut

A Thai word tokenization library using Deep Neural Network
MIT License
420 stars 96 forks source link

Using batch generator instead of train all training set #29

Closed titipata closed 5 years ago

titipata commented 6 years ago

@kittinan, here is workaround script to train using fit_generator instead of fit. You can replace lines 184-193 in train.py to the following:

import itertools
import numpy as np

def generator(l1, l2, l3, batch_size=128):
    gen1 = iter(itertools.cycle(l1))
    gen2 = iter(itertools.cycle(l2))
    gen3 = iter(itertools.cycle(l3))
    while 1:
        yield [np.vstack([next(gen1) for _ in range(batch_size)]), np.vstack([next(gen2) for _ in range(batch_size)])],  np.vstack([next(gen3) for _ in range(batch_size)])

batch_size = 128
gen_batch_train = generator(x_train_char, x_train_type, y_train batch_size=batch_size)
gen_batch_val = generator(x_val_char, x_val_type, y_val, batch_size=batch_size)
model.fit_generator(gen_batch_train, steps_per_epoch=len(x_train_char) // batch_size, 
                    epochs=10, verbose=verbose,
                    validation_data=gen_batch_val,
                    validation_steps=len(x_val_char) // batch_size,
                    callbacks=callbacks_list)
titipata commented 5 years ago

@kittinan I guess this is an enhancement but we won't do it now. We can refactor the code once we retrain the model. I'll just close this issue due to inactivity.