rth / vtext

Simple NLP in Rust with Python bindings
Apache License 2.0
147 stars 11 forks source link

Add pickling support for Python tokenizers #73

Closed rth closed 4 years ago

rth commented 4 years ago

Partially addresses https://github.com/rth/vtext/issues/25

This adds __getstate__ / __setstate__ methods to make pickling work, following discussion in https://github.com/PyO3/pyo3/issues/100 and adapting the https://gist.github.com/ethanhs/fd4123487974c91c7e5960acc9aa2a77 example.

There is probably some way to add those methods via macros to avoid code repetition but I haven't figured it out yet.

Pickling support for stem and vectorize modules will be added in a follow up PR.

This also removes the parameter attributes from python wrappers e.g. RegexpTokenizer.pattern as they were anyway not synced with the Rust parameter struct (here RegexpTokenizer.inner.params.pattern), so changing them had no effect. If we want to make it work we could rather first make sure set_params / get_params methods are working as expected, and then implement them via __getattr__ / __setattr__.