[VinVL] Support for ONNX

Hello there,
Thank you so much for this great repository. I've been using VinVL for a while now and I'm really pleased with its accuracy. However, considering its size, I was wondering whether you had any plans to support ONNX to speed up the inference process. I have tried myself to enable it but I got some very strange errors while completing some operations.
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:2815: UserWarning: Exporting aten::index operator of advanced indexing in opset 9 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn("Exporting aten::index operator of advanced indexing in opset " +
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1899, in to_onnx
    torch.onnx.export(self, input_sample, file_path, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph
    graph = _optimize_graph(graph, operator_export_type,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 373, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 1032, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 1866, in slice
    raise RuntimeError("step!=1 is currently not supported")
RuntimeError: step!=1 is currently not supported

>>> extractor.to_onnx("storage/model/vinvl_... by Suglia, Alessandro
Suglia, Alessandro17/10 16:52
>>> extractor.to_onnx("storage/model/vinvl_vg_x152c4_simbot.onnx", export_params=True, input_sample=input_sample)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:21: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  bbox = torch.as_tensor(bbox, dtype=torch.float32, device=device)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:26: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if bbox.size(-1) != 4:
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/rpn/inference.py:94: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  pre_nms_top_n = min(self.pre_nms_top_n, num_anchors)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/rpn/inference.py:111: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  for proposal, score, im_shape in zip(proposals, objectness, image_shapes):
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:169: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  boxlist_empty.add_field("scores", torch.Tensor([]).to(device))
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:206: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  if len(inds)>0:
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:131: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  while new_boxlist.bbox.shape[0] < \
/home/ubuntu/emma/perception/src/emma_perception/models/vinvl_extractor.py:55: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  out = zip(batch["ids"], batch["width"], batch["height"], predictions, cnn_features)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:99: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(size, self.size))
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:2815: UserWarning: Exporting aten::index operator of advanced indexing in opset 9 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn("Exporting aten::index operator of advanced indexing in opset " +
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1899, in to_onnx
    torch.onnx.export(self, input_sample, file_path, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph
    graph = _optimize_graph(graph, operator_export_type,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 373, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 1032, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 1866, in slice
    raise RuntimeError("step!=1 is currently not supported")
RuntimeError: step!=1 is currently not supported
Do you have any advise?
microsoft / scene_graph_benchmark

[VinVL] Support for ONNX #96