Both Apache Arrow and Tenzir assume that strings are always encoded as valid UTF-8. However, this promise is easy to break. The pcap parser does so currently, and emits invalid JSON that is not correctly escaped.
We discussed previously that we want to add a "blob" type (exact name tbd) that wraps Arrow's binary type.
### Definition of Done
- [x] Agree on the desired type semantics and behavior for parsers, printers, and the adaptive builder
- [x] Implement the new type
- [x] Make use of the new type in the PCAP parser
Both Apache Arrow and Tenzir assume that strings are always encoded as valid UTF-8. However, this promise is easy to break. The
pcap
parser does so currently, and emits invalid JSON that is not correctly escaped.We discussed previously that we want to add a "blob" type (exact name tbd) that wraps Arrow's binary type.