scrapinghub / python-crfsuite

A python binding for crfsuite
MIT License
770 stars 222 forks source link

How to train crf in batch? #64

Open susht3 opened 7 years ago

susht3 commented 7 years ago

i have a big dataset, how to train this crf in batch?

kmike commented 7 years ago

Currently CRFSuite C++ library doesn't support mini-batch training, so you can't do that with python-crfsuite.

If you have issues with memory usage with python-crfsuite, you can generate feature dicts iteratively (see https://github.com/scrapinghub/python-crfsuite/issues/37#issuecomment-224575213); it should help to reduce memory, as usually most memory is taken by Python-level feature dicts; internal feature representation is more efficient. See also: https://github.com/TeamHG-Memex/sklearn-crfsuite/issues/15.