Description
This PR introduces a completely redesigned way to manage and load resources. There are now loaded dynamically, based on the content of the trained engine directory (which used to be a single json file), instead of being statically hardcoded.
This has several advantages:
Compile time is reduced, mostly in debug mode, as we don't generate static map to compile anymore
Binary size is reduced as resources have been extracted from the binary
Resources can be managed externally more easily.
It comes with some drawbacks though:
Loading the engine is a bit longer: this is expected as we are now actually doing IO stuff
Parsing is a bit longer: this is somehow more surprising. When resources were embedded statically, we were using phf (perfect hash functions), which means no collision, which means no collision resolution, which may end up being faster.
This is the benchmark I ran on my laptop (MacBook Pro 2017 Core i7) on a medium-size weather dataset:
New serialization & resources management
----------------------------------------
test nlu_loading ... bench: 163,866,780 ns/iter (+/- 6,099,507)
test nlu_parsing ... bench: 4,210,047 ns/iter (+/- 886,156)
cargo build --all 585.62s user 32.24s system 248% cpu 4:08.74 total
cargo build --all --release 1920.03s user 42.25s system 305% cpu 10:42.49 total
memory 42.3 Mb
$ ll target/release/snips-nlu-cli
-rwxr-xr-x 2 adrien staff 25M Jul 12 15:06 target/release/snips-nlu-cli
Previous (0.57.1)
-----------------
test nlu_loading ... bench: 89,731,218 ns/iter (+/- 5,837,671)
test nlu_parsing ... bench: 2,485,156 ns/iter (+/- 702,864)
cargo build --all 1283.53s user 36.86s system 156% cpu 14:01.98 total
cargo build --all --release 2016.45s user 48.77s system 323% cpu 10:37.64 total
$ ll target/release/snips-nlu-cli
-rwxr-xr-x 2 adrien staff 56M Jul 12 15:07 target/release/snips-nlu-cli
Here are the details of the PR:
XxxConfiguration objects have been renamed to XxxModel because, well, they are actually models
Processing units now implement FromPath, thus they can be loaded from file or directory
NLU engine can be loaded from a zip reader
Resources are now loaded dynamically from the trained engine directory instead of being embedded statically in the code
The stemming logic has been improved. In particular, resources are stemmed beforehand.
Outdated dependencies have been updated (regex, csv)
Python, Kotlin and Swift wrappers have been updated
Rust example has been replaced with an interactive parsing CLI
Description This PR introduces a completely redesigned way to manage and load resources. There are now loaded dynamically, based on the content of the trained engine directory (which used to be a single json file), instead of being statically hardcoded.
This has several advantages:
It comes with some drawbacks though:
This is the benchmark I ran on my laptop (MacBook Pro 2017 Core i7) on a medium-size weather dataset:
Here are the details of the PR:
XxxConfiguration
objects have been renamed toXxxModel
because, well, they are actually modelsFromPath
, thus they can be loaded from file or directoryregex
,csv
)