Open kirklandsign opened 1 week ago
The proposal looks good to me. I saw you mentioned ET uses fbs serialization, we can consider it as an option as well. You could provide conversion java EValue <-> fbs and fbs definition in ET.
The proposal looks good to me. I saw you mentioned ET uses fbs serialization, we can consider it as an option as well. You could provide conversion java EValue <-> fbs and fbs definition in ET.
fbs is only for runtime. It's over complicated for java serialization use case. Contains other runtime details. You can see the schema here https://github.com/pytorch/executorch/blob/b07be360ae8b52cad60cd5d52c2e71f9c59be81c/schema/program.fbs#L71-L144
I see. it makes sense.
For training, I propose we reuse this serialized EValue as well. The workflow is as below.
I found pytorch can define custom dataset/dataloader and IterableDataset + Dataloader looks similar to what external_dataset done.
I'm quite new to pytorch, feel free to suggest other options.
the comment above is only to start discussion and low priority. Do we have ETA for this serialization format?
Hi @qiaoli31 for serialization, we are working on it. My target is before mid November.
I see. it makes sense.
For training, I propose we reuse this serialized EValue as well. The workflow is as below.
- Partner implements onTrainingExample to generate a list of TrainingExampleRecord which has byte[] holding training data. In TFLite, this byte[] is serialized tf.example proto. For executorch, byte[] is serialized EValue[].
- We need a custom dataset/dataloader that initial calls to onTrainingExample to read training data one by one (IPC has size limit). In TFLite, model graph has external_dataset custom op did this. The conversion of byte[] -> tf.example proto -> tf.tensor is written in model graph as well.
I found pytorch can define custom dataset/dataloader and IterableDataset + Dataloader looks similar to what external_dataset done.
- Is dataset/dataloader available in executorch? Does it have c++ API interface?
- can dataset write into executorch model graph? or we need write c++ code in ODP to connect it?
- if step (1) above looks good, we will keep existing byte[] for training API.
I'm quite new to pytorch, feel free to suggest other options.
cc @JacobSzwejbka on this. I'm not familiar with training part
🚀 The feature, motivation and pitch
Context: https://github.com/pytorch/executorch/issues/6470#issuecomment-2436060471
We need to design a way to serialize an EValue (and its embedding tensor), for some IPC use case in AOSP.
This won’t be the official serialization across ET. ET uses fbs for serialization. This is only for the Java frameworks layer for AOSP.
Basic layout
Tag (1 byte) Bytes_of_payload (8 bytes?) < < < < < < < Payload (var) … …
Where payload can be one of them
None (0 byte): No value or absence of a value. Tensor (var): A multi-dimensional array used for numerical computations. String/uint8_array (var): A sequence of characters, often used to represent text. Double (8): A 64-bit floating-point number, used for decimal arithmetic. Int (4 or 8?): A 32-bit integer, used for whole numbers. Bool (1): A boolean value, either true or false. ListBool (var): A list of boolean values. ListDouble (var): A list of double-precision floating-point numbers. ListInt (var): A list of integers. ListTensor (var): A list of tensors. ListScalar (var): A list of scalar values (e.g., numbers). ListOptionalTensor (var): A list of optional tensors, where each element may be present or absent.
Per Jacob, we don’t care about List types (6-11). They are internal to ET runtime.
For those without variant length (0, 3, 4, 5), we just serialize the value directly.
For 2 String, let’s assume that it’s uint8_t[]. We just serialize the array directly.
So we will focus on the tensor type.
Tensor type
Scalar_type (1 byte) Num_dim (1) Sizes (var) … Dim_order (var) … Data (var) … … … … …
Questions
Do we really need a field for Bytes_of_payload? Do we need Dim order? Do we need TensorShapeDynamism?
Alternatives
No response
Additional context
No response
RFC (Optional)
No response