Closed cchudant closed 2 years ago
Actually, I propose we just do what ONNX is doing https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L479
Protobuf serialization has a size limit, I would rather like to avoid this kind of restriction (we can however test it, also, GRPC does include a Cbor-like serialization). However, your second solution to handle that part does sounds very interesting. I would be interested to be able to serialize a tensor directly, without asking the user to turn the tensor into a list first.
Yes, we should continue using a streaming request, and splitting the array into different RunModelRequests. ONNX is kind of doing the second option I was proposing:
message Tensor {
// The data type of the tensor.
DataType data_type = 2;
// Depending on the data_type field, exactly one of the fields below with
// name ending in _data is used to store the elements of the tensor.
// For float and complex64 values
// Complex64 tensors are encoded as a single array of floats,
// with the real components appearing in odd numbered positions,
// and the corresponding imaginary component appearing in the
// subsequent even numbered position. (e.g., [1.0 + 2.0i, 3.0 + 4.0i]
// is encoded as [1.0, 2.0 ,3.0 ,4.0]
// When this field is present, the data_type field MUST be FLOAT or COMPLEX64.
repeated float float_data = 4 [packed = true];
// For int32, uint8, int8, uint16, int16, bool, and float16 values
// float16 values must be bit-wise converted to an uint16_t prior
// to writing to the buffer.
// When this field is present, the data_type field MUST be
// INT32, INT16, INT8, UINT16, UINT8, BOOL, or FLOAT16
repeated int32 int32_data = 5 [packed = true];
// For strings.
// Each element of string_data is a UTF-8 encoded Unicode
// string. No trailing null, no leading BOM. The protobuf "string"
// scalar type is not used to match ML community conventions.
// When this field is present, the data_type field MUST be STRING
repeated bytes string_data = 6;
// For int64.
// When this field is present, the data_type field MUST be INT64
repeated int64 int64_data = 7 [packed = true];
// Serializations can either use one of the fields above, or use this
// raw bytes field. The only exception is the string case, where one is
// required to store the content in the repeated bytes string_data field.
//
// When this raw_data field is used to store tensor value, elements MUST
// be stored in as fixed-width, little-endian order.
// Floating-point data types MUST be stored in IEEE 754 format.
// Complex64 elements must be written as two consecutive FLOAT values, real component first.
// Complex128 elements must be written as two consecutive DOUBLE values, real component first.
// Boolean type MUST be written one byte per tensor element (00000001 for true, 00000000 for false).
//
// Note: the advantage of specific field rather than the raw_data field is
// that in some cases (e.g. int data), protobuf does a better packing via
// variable length storage, and may lead to smaller binary footprint.
// When this field is present, the data_type field MUST NOT be STRING or UNDEFINED
bytes raw_data = 9;
}
They pack stuff into different protobuf array types depending on the type of the input. This approach is compatible with splitting the tensor into multiple RunModelRequests.
The reason why they also support not packing everything into a bytes
is interesting. I suggest we just focus on supporting the raw_data
representation for now.
As for
without asking the user to turn the tensor into a list first.
I think we should do that yes! The Python client should accept torch tensors and numpy tensors. However, I think this is a separate issue that only affect the Python client and not how we serialize it to the wire. I will open a separate issue for that.
Description
We are currently using cbor for some of the serializing: transforming the flattened input tensors to a byte array. This is probably overkill, and having a dependency on cbor is troublesome for porting the client library to other languages, like javascript, as on npmjs cbor packages are either old or nodejs-only.
I see two ways of doing this:
Motivation and Context
Dependency on cbor2 in the client and server side.
Test plans
Let's not think about backward compat :)
Additional Information
None
Checklist