tenzir / public-roadmap

The public roadmap of Tenzir
https://docs.tenzir.com/roadmap
4 stars 0 forks source link

Support Binary Data #138

Closed dominiklohmann closed 8 months ago

dominiklohmann commented 9 months ago

Both Apache Arrow and Tenzir assume that strings are always encoded as valid UTF-8. However, this promise is easy to break. The pcap parser does so currently, and emits invalid JSON that is not correctly escaped.

We discussed previously that we want to add a "blob" type (exact name tbd) that wraps Arrow's binary type.

### Definition of Done
- [x] Agree on the desired type semantics and behavior for parsers, printers, and the adaptive builder
- [x] Implement the new type
- [x] Make use of the new type in the PCAP parser
### Tasks
- [ ] https://github.com/tenzir/issues/issues/903
- [ ] https://github.com/tenzir/issues/issues/897