ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.18k stars 1.19k forks source link

support Active Learning procedure? #2278

Closed ericxsun closed 4 weeks ago

ericxsun commented 2 years ago

Is your feature request related to a problem? Please describe.

Is it possible do active learning based on the current master branch? Any clue will be highly appreciated.

w4nderlust commented 2 years ago

Hi @ericxsun , thanks for asking this question. Ludwig has a mechanism to allow training of models on batches of data incrementally. We call it train_online and you can see it documented here https://ludwig-ai.github.io/ludwig-docs/0.5/user_guide/api/LudwigModel/#train_online . The difference with the redular train is that it runs on the batch of data provided only once. This makes it useful for implementing an active learning loop that may look like:

model = LudwigModel(...)
model.train(my_data)
for i in active_loop_steps:
  new_data = get_new_data()
  predictions = model.predict(new_data)
  most_valuable_data = active_learning(new_data, predictions)  # need to implement it yourself or use an active learning library for this
  model.train_online(most_valuable_data)

One may want to also train on old datapoints to avoid catastrophic forgetting, and also maybe manipulate the learning rate and some other hyperparameters for finetuning purposes when using train_online, but this is the sketch on how you can use it.

Does this help?

ericxsun commented 2 years ago

That's awesome. Thanks a lot. I'll try it.