Protarrow is a python library for converting from Protocol Buffers to Apache Arrow and back.
It is used at Tradewell Technologies, to share data between transactional and analytical applications, with little boilerplate code and zero data loss.
pip install protarrow
Taking a simple protobuf:
message MyProto {
string name = 1;
int32 id = 2;
repeated int32 values = 3;
}
It can be converted to a pyarrow.Table
:
import protarrow
my_protos = [
MyProto(name="foo", id=1, values=[1, 2, 4]),
MyProto(name="bar", id=2, values=[3, 4, 5]),
]
table = protarrow.messages_to_table(my_protos, MyProto)
name | id | values |
---|---|---|
foo | 1 | [1 2 4] |
bar | 2 | [3 4 5] |
And the table can be converted back to proto:
protos_from_table = protarrow.table_to_messages(table, MyProto)
See the documentation