rth / vtext

Simple NLP in Rust with Python bindings
Apache License 2.0
147 stars 11 forks source link

ENH Avoid copying tokens in tokenizers in Python #46

Closed rth closed 4 years ago

rth commented 5 years ago

Currently, tokenizers return Vec<String> where tokens are be slices of the input string. Moving to Vec<&str> would remove one memory copy and is likely to help with run time.

This should be possible with PyO3 0.7.0 (not yet released) that will allow using lifetime specifiers in pymethods.

rth commented 4 years ago

Closing as resolved some time ago.