scrapinghub / python-crfsuite

A python binding for crfsuite
MIT License
770 stars 222 forks source link

How to add a new entity to an existed crf model ? Or merge new model with old one? #60

Closed eromoe closed 7 years ago

eromoe commented 7 years ago

Hi,

I think it is a very common requirement. Assume I already have a crf model tag entity to cat and dog . now I also need to tag mouse entity. What I have to do is :

  1. retrain the crf model with all dataset, and use the new crf model
  2. train another crf model using dataset only mouse taged. use old model and new model together.

Very inefficient.

kmike commented 7 years ago

AFAIK there is no a principled way to merge such models; you have to retrain them from scratch, using combined data. It is not a question of software implementation, but a theoretical question of how to do this. For example, O entity changes when you add an entity - in your example it should somehow exclude MOUSE after merging. So transition probabilities O -> CAT, O -> DOG, CAT -> O, DOG -> O and O -> O in the first model are no longer valid in a combined model, and other probabilities need to be adjusted as well, and I'm not sure it can be done just by combining model weights, without re-training.

Maybe you can get a better answer at https://stats.stackexchange.com/, or by searching for papers which describe how to merge such models. I'm closing this issue because it is not really related to python-crfsuite wrapper.

sarthakpawar commented 5 years ago

Is there any way to partially fit the model first fit on TrainingSet1, then take the same model and fit on TrainingSet2. Assume tags to predict are same format is the same but this time it's learning should improve. I tried running fit twice with different datasets but it is overriding existing learning