scikit-learn-contrib / hiclass

A python library for hierarchical classification compatible with scikit-learn
BSD 3-Clause "New" or "Revised" License
114 stars 19 forks source link

Continual Learning with HiClass #70

Open Yasmen-Wahba opened 1 year ago

Yasmen-Wahba commented 1 year ago

I was wondering if it would be possible for HiClass to incorporate Online/Continual Learning to process streams of data. Something like what is implemented in a library like "avalanche" https://github.com/ContinualAI/avalanche

mirand863 commented 1 year ago

Hi @Yasmen-Wahba,

This is an interesting topic and I believe it is possible. However, we are currently working on multi-label classification and do not have the bandwidth for this feature, but I will leave this issue open and likely work on it in the future.

There was someone who volunteered to implement this on GitLab, but I am not sure if he managed to do it.

Out of curiosity, how much time do you estimate would it save you if you can skip retraining the model?

Yasmen-Wahba commented 1 year ago

Actually, it's not about time. The current model I deployed is linear, so it's very fast and its accuracy is very satisfactory. The problem lies in the nature/distribution of the incoming data, where new classes are being added and my model knows nothing about them. That's why I thought about Incremental/Continual/Lifelong Learning to process data as streams and to adapt to new changes without the need to retrain the model each time a new class is added!

mirand863 commented 1 year ago

Actually, it's not about time. The current model I deployed is linear, so it's very fast and its accuracy is very satisfactory. The problem lies in the nature/distribution of the incoming data, where new classes are being added and my model knows nothing about them. That's why I thought about Incremental/Continual/Lifelong Learning to process data as streams and to adapt to new changes without the need to retrain the model each time a new class is added!

Let me see if I understood correctly... For example, if you train the model with classes A and B, then later you only want to train with class C, without adding new samples for classes A and B, right? If that is the case, then I believe it would be easier to implement and I could try to do next week.

Yasmen-Wahba commented 1 year ago

I want to be able to train the models with class C or with class C and D. I want my model to accept new class(es) and continue training without complaining.

mirand863 commented 1 year ago

I want to be able to train the models with class C or with class C and D. I want my model to accept new class(es) and continue training without complaining.

Hi @Yasmen-Wahba,

I only have a couple more questions before I start implementing this.

Do you think warm_start would be a good option for your use case? For example:

lcpn = LocalClassifierPerNode(
    local_classifier=SVM(...),
    warm_start=True,
)
lcpn.fit(x, y)
# a few days later...
lcpn.fit(new_x, new_y)

Will you be able to ensure that only new data is being used in subsequent calls to fit, e.g., only classes C and D in your example? Or do you think it is better to check and skip classes for nodes that were already fitted previously?

Yasmen-Wahba commented 1 year ago

Hi Fabio. It would be great if we can accept old and new classes, check for old and skip and fit only new classes :)