Closed ericxsun closed 4 weeks ago
Hi @ericxsun , thanks for asking this question.
Ludwig has a mechanism to allow training of models on batches of data incrementally. We call it train_online
and you can see it documented here https://ludwig-ai.github.io/ludwig-docs/0.5/user_guide/api/LudwigModel/#train_online . The difference with the redular train
is that it runs on the batch of data provided only once.
This makes it useful for implementing an active learning loop that may look like:
model = LudwigModel(...)
model.train(my_data)
for i in active_loop_steps:
new_data = get_new_data()
predictions = model.predict(new_data)
most_valuable_data = active_learning(new_data, predictions) # need to implement it yourself or use an active learning library for this
model.train_online(most_valuable_data)
One may want to also train on old datapoints to avoid catastrophic forgetting, and also maybe manipulate the learning rate and some other hyperparameters for finetuning purposes when using train_online
, but this is the sketch on how you can use it.
Does this help?
That's awesome. Thanks a lot. I'll try it.
Is your feature request related to a problem? Please describe.
Is it possible do active learning based on the current master branch? Any clue will be highly appreciated.