Could not generate etrecord for sotries110M

I am currently learning about the example "llama2" by following the instructions provided in the README. I used the "stories110M" model introduced in option B. To export the PTE file, I used the command python -m examples.models.llama2.export_llama --checkpoint ./stories110M/stories110M.pt -p ./stories110M/params.json -X -kv --use_sdpa_with_kv_cache -qmode 8da4w --group_size 128 -d fp32 -o ptes -n stories110M --generate_etrecord. Please note that I added the option --generate_etrecord, which led to the following error. However, the command works fine without this option. The commit I test with is a36ace7ed2441887817c6d9e851e91129a9bd961.
python -m examples.models.llama2.export_llama --checkpoint ./stories110M/stories110M.pt -p ./stories110M/params.json -X -kv --use_sdpa_with_kv_cache -qmode 8da4w --group_size 128 -d fp32 -o ptes -n stories110M --generate_etrecord                                              
[INFO 2024-05-31 12:45:09,771 export_llama_lib.py:391] Applying quantizers: []
creating canonical path for ./stories110M/stories110M.pt
creating canonical path for ./stories110M/params.json
creating canonical path for ptes
[INFO 2024-05-31 12:45:09,771 builder.py:91] Loading model with checkpoint=./stories110M/stories110M.pt, params=./stories110M/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-05-31 12:45:09,798 builder.py:112] Loaded model with dtype=torch.float32
[INFO 2024-05-31 12:45:09,836 config.py:58] PyTorch version 2.3.0a0+git23961ce available.
linear: layers.0.attention.wq, in=768, out=768
linear: layers.0.attention.wk, in=768, out=768
linear: layers.0.attention.wv, in=768, out=768
linear: layers.0.attention.wo, in=768, out=768
linear: layers.0.feed_forward.w1, in=768, out=2048
linear: layers.0.feed_forward.w2, in=2048, out=768
linear: layers.0.feed_forward.w3, in=768, out=2048
linear: layers.1.attention.wq, in=768, out=768
linear: layers.1.attention.wk, in=768, out=768
linear: layers.1.attention.wv, in=768, out=768
linear: layers.1.attention.wo, in=768, out=768
linear: layers.1.feed_forward.w1, in=768, out=2048
linear: layers.1.feed_forward.w2, in=2048, out=768
linear: layers.1.feed_forward.w3, in=768, out=2048
linear: layers.2.attention.wq, in=768, out=768
linear: layers.2.attention.wk, in=768, out=768
linear: layers.2.attention.wv, in=768, out=768
linear: layers.2.attention.wo, in=768, out=768
linear: layers.2.feed_forward.w1, in=768, out=2048
linear: layers.2.feed_forward.w2, in=2048, out=768
linear: layers.2.feed_forward.w3, in=768, out=2048
linear: layers.3.attention.wq, in=768, out=768
linear: layers.3.attention.wk, in=768, out=768
linear: layers.3.attention.wv, in=768, out=768
linear: layers.3.attention.wo, in=768, out=768
linear: layers.3.feed_forward.w1, in=768, out=2048
linear: layers.3.feed_forward.w2, in=2048, out=768
linear: layers.3.feed_forward.w3, in=768, out=2048
linear: layers.4.attention.wq, in=768, out=768
linear: layers.4.attention.wk, in=768, out=768
linear: layers.4.attention.wv, in=768, out=768
linear: layers.4.attention.wo, in=768, out=768
linear: layers.4.feed_forward.w1, in=768, out=2048
linear: layers.4.feed_forward.w2, in=2048, out=768
linear: layers.4.feed_forward.w3, in=768, out=2048
linear: layers.5.attention.wq, in=768, out=768
linear: layers.5.attention.wk, in=768, out=768
linear: layers.5.attention.wv, in=768, out=768
linear: layers.5.attention.wo, in=768, out=768
linear: layers.5.feed_forward.w1, in=768, out=2048
linear: layers.5.feed_forward.w2, in=2048, out=768
linear: layers.5.feed_forward.w3, in=768, out=2048
linear: layers.6.attention.wq, in=768, out=768
linear: layers.6.attention.wk, in=768, out=768
linear: layers.6.attention.wv, in=768, out=768
linear: layers.6.attention.wo, in=768, out=768
linear: layers.6.feed_forward.w1, in=768, out=2048
linear: layers.6.feed_forward.w2, in=2048, out=768
linear: layers.6.feed_forward.w3, in=768, out=2048
linear: layers.7.attention.wq, in=768, out=768
linear: layers.7.attention.wk, in=768, out=768
linear: layers.7.attention.wv, in=768, out=768
linear: layers.7.attention.wo, in=768, out=768
linear: layers.7.feed_forward.w1, in=768, out=2048
linear: layers.7.feed_forward.w2, in=2048, out=768
linear: layers.7.feed_forward.w3, in=768, out=2048
linear: layers.8.attention.wq, in=768, out=768
linear: layers.8.attention.wk, in=768, out=768
linear: layers.8.attention.wv, in=768, out=768
linear: layers.8.attention.wo, in=768, out=768
linear: layers.8.feed_forward.w1, in=768, out=2048
linear: layers.8.feed_forward.w2, in=2048, out=768
linear: layers.8.feed_forward.w3, in=768, out=2048
linear: layers.9.attention.wq, in=768, out=768
linear: layers.9.attention.wk, in=768, out=768
linear: layers.9.attention.wv, in=768, out=768
linear: layers.9.attention.wo, in=768, out=768
linear: layers.9.feed_forward.w1, in=768, out=2048
linear: layers.9.feed_forward.w2, in=2048, out=768
linear: layers.9.feed_forward.w3, in=768, out=2048
linear: layers.10.attention.wq, in=768, out=768
linear: layers.10.attention.wk, in=768, out=768
linear: layers.10.attention.wv, in=768, out=768
linear: layers.10.attention.wo, in=768, out=768
linear: layers.10.feed_forward.w1, in=768, out=2048
linear: layers.10.feed_forward.w2, in=2048, out=768
linear: layers.10.feed_forward.w3, in=768, out=2048
linear: layers.11.attention.wq, in=768, out=768
linear: layers.11.attention.wk, in=768, out=768
linear: layers.11.attention.wv, in=768, out=768
linear: layers.11.attention.wo, in=768, out=768
linear: layers.11.feed_forward.w1, in=768, out=2048
linear: layers.11.feed_forward.w2, in=2048, out=768
linear: layers.11.feed_forward.w3, in=768, out=2048
linear: output, in=768, out=32000
[INFO 2024-05-31 12:45:17,524 sdpa_with_kv_cache.py:24] Loading custom ops library: /mnt/hd4/xuzhi/workspace/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.so
[INFO 2024-05-31 12:45:20,735 builder.py:285] Using pt2e [] to quantizing the model...
[INFO 2024-05-31 12:45:20,735 builder.py:305] No quantizer provided, passing...
[INFO 2024-05-31 12:45:29,372 export_llama_lib.py:442] Generating etrecord
[INFO 2024-05-31 12:45:29,712 xnnpack_partitioner.py:560] Found 85 subgraphs to be partitioned.
/mnt/hd4/xuzhi/workspace/executorch/exir/emit/_emitter.py:1474: UserWarning: Mutation on a buffer in the model is detected. ExecuTorch assumes buffers that are mutated in the graph have a meaningless initial state, only the shape and dtype will be serialized.
  warnings.warn(
[INFO 2024-05-31 12:45:40,939 builder.py:383] Required memory for activation in bytes: [0, 9641472]
Traceback (most recent call last):
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/export_serialize.py", line 1002, in serialize_graph
    getattr(self, f"handle_{node.op}")(node)
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/serialize.py", line 122, in handle_call_function
    super().handle_call_function(node)
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/export_serialize.py", line 412, in handle_call_function
    inputs = [
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/export_serialize.py", line 415, in <listcomp>
    arg=self.serialize_input(a),
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/serialize.py", line 249, in serialize_input
    return super().serialize_input(arg)
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/export_serialize.py", line 715, in serialize_input
    raise SerializeError(f"Unsupported argument type: {type(arg)}")
executorch.exir.serde.export_serialize.SerializeError: Unsupported argument type: <class 'torch._ops.OpOverload'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/hd4/xuzhi/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/hd4/xuzhi/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/hd4/xuzhi/workspace/executorch/examples/models/llama2/export_llama.py", line 31, in <module>
    main()  # pragma: no cover
  File "/mnt/hd4/xuzhi/workspace/executorch/examples/models/llama2/export_llama.py", line 27, in main
    export_llama(modelname, args)
  File "/mnt/hd4/xuzhi/workspace/executorch/examples/models/llama2/export_llama_lib.py", line 308, in export_llama
    builder = _export_llama(modelname, args)
  File "/mnt/hd4/xuzhi/workspace/executorch/examples/models/llama2/export_llama_lib.py", line 449, in _export_llama
    generate_etrecord(
  File "/mnt/hd4/xuzhi/workspace/executorch/sdk/etrecord/_etrecord.py", line 204, in generate_etrecord
    _handle_edge_dialect_exported_program(
  File "/mnt/hd4/xuzhi/workspace/executorch/sdk/etrecord/_etrecord.py", line 109, in _handle_edge_dialect_exported_program
    serialized_artifact = serialize(edge_dialect_exported_program)
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/serialize.py", line 725, in serialize
    serialized_artifact = ExportedProgramSerializer(opset_version).serialize(
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/serialize.py", line 331, in serialize
    serialized_graph_module = gm_serializer.serialize(exported_program.graph_module)
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/export_serialize.py", line 1019, in serialize
    graph = self.serialize_graph(graph_module)
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/serialize.py", line 253, in serialize_graph
    return super().serialize_graph(graph_module)
  File "/mnt/hd4/xuzhi/workspace/executorch/exir/serde/export_serialize.py", line 1004, in serialize_graph
    raise SerializeError(
executorch.exir.serde.export_serialize.SerializeError: Failed serializing node auto_functionalized in graph: %auto_functionalized : [num_users=3] = call_function[target=torch._higher_order_ops.auto_functionalize.auto_functionalized](args = (llama.sdpa_with_kv_cache.default,), kwargs = {query: %aten_view_copy_default_13, key: %aten_view_copy_default_14, value: %aten_view_copy_default_8, key_cache: %arg37_1, value_cache: %arg38_1, start_pos: %_local_scalar_dense, seq_len: 1, attn_mask: None, drpout_p: 0.0, is_causal: False, scale: None})
pytorch / executorch

Could not generate etrecord for sotries110M #3788