weld-project / weld

High-performance runtime for data analytics applications
https://www.weld.rs
BSD 3-Clause "New" or "Revised" License
2.99k stars 260 forks source link

String encoding and decoding support without memory management #506

Closed sppalkia closed 4 years ago

sppalkia commented 4 years ago

Adds support for encoding and decoding string data. This currently only supports Numpy arrays with the 'S' dtype, which treats strings as null-terminated bytestrings and only supports ASCII. Non-ASCII strings can be encoded using `encode('utf-8') in Python.

This patch does not perform any kind of memory management of strings, i.e., the encoders and decoders allocate memory that is never freed. This will be addressed in a follow on patch that will allow these buffers to be reclaimed.