uber / neuropod

A uniform interface to run deep learning models from multiple frameworks
https://neuropod.ai
Apache License 2.0
936 stars 77 forks source link

Add unicode support and tests #527

Closed VivekPanyam closed 2 years ago

VivekPanyam commented 2 years ago

Summary:

This PR modifies Neuropod string handling to assume that strings are UTF-8 encoded. This adds support for things like emoji 😎 (and all the other characters supported by unicode), but does have a few downsides and introduces some complexity.

I'm working on some docs that dig into the difficulties of string tensors and especially string tensors + unicode, but they're not super relevant for the functionality in this PR so I'll post that separately.

The major impact of this PR is the following change.

Breaking change: opaque sequences of bytes will no longer generally work as strings (because the python bindings now expect strings to be valid UTF-8 strings).

This can be mitigated by adding a bytes type to Neuropod that supports arbitrary byte sequences. Please comment on this PR or open an issue if this is an important use case for you and we can prioritize it.

Test Plan:

CI + modified the string test to use emojis

codecov[bot] commented 2 years ago

Codecov Report

Merging #527 (b444b99) into master (5c89647) will increase coverage by 0.01%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #527      +/-   ##
==========================================
+ Coverage   86.56%   86.58%   +0.01%     
==========================================
  Files         105      104       -1     
  Lines        6930     6940      +10     
==========================================
+ Hits         5999     6009      +10     
  Misses        931      931              
Impacted Files Coverage Δ
...e/neuropod/backends/python_bridge/python_bridge.hh 100.00% <ø> (ø)
source/python/neuropod/utils/dtype_utils.py 100.00% <ø> (ø)
...thon_bridge/_neuropod_native_bootstrap/executor.py 92.64% <100.00%> (ø)
...e/neuropod/backends/python_bridge/python_bridge.cc 83.24% <100.00%> (-0.70%) :arrow_down:
source/neuropod/bindings/python_bindings.cc 95.45% <100.00%> (+0.82%) :arrow_up:
source/python/neuropod/loader.py 68.05% <100.00%> (+2.90%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5c89647...b444b99. Read the comment docs.