james-bowman / nlp

Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
MIT License
446 stars 45 forks source link

LDA model persistence #12

Closed trpstra closed 3 years ago

trpstra commented 3 years ago

Thanks for this library, it seems really useful. I have been playing around a bit with a feature extractor pipeline of countvectoriser and tfidf transformer feeding into an LDA transformer, but I can't seem to save the Fit'ed pipeline to disk and reload it later to Transform new docs. Looking at the serialized pipeline in json, it seems the vocabulary is there, as well as the tokenizer info and various LDA params, but I don't see the induced topics (matrices). Maybe this is a problem with the way I serialized it? If you can point to a working example of how to properly serialize a trained LDA model and re-use it later, that would be great. Thanks again!

james-bowman commented 3 years ago

You are correct the LDA transformer is not serialisable yet, unfortunately, I just haven't gotten around to implementing it. If you fancy having a go yourself, feel free to submit a pull request, or, in the meantime, you could individually persist the component parts of the LDA model and then recreate it from those parts at a later time.

On Sat, 2 Jan 2021 at 22:54, onclue notifications@github.com wrote:

Thanks for this library, it seems really useful. I have been playing around a bit with a feature extractor pipeline of countvectoriser and tfidf transformer feeding into an LDA transformer, but I can't seem to save the Fit'ed pipeline to disk and reload it later to Transform new docs. Looking at the serialized pipeline in json, it seems the vocabulary is there, as well as the tokenizer info and various LDA params, but I don't see the induced topics (matrices). Maybe this is a problem with the way I serialized it? If you can point to a working example of how to properly serialize a trained LDA model and re-use it later, that would be great. Thanks again!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/james-bowman/nlp/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR7W335TDCUB4U3XLOTAS3SX6P3RANCNFSM4VRMOKJA .

trpstra commented 3 years ago

Thanks, that makes sense. I will have a look.