Open zalo opened 2 years ago
Hi, I am a little confused about your option of upgrade to 1.12. From pytorch/issues/30517, we know that 1.10 already supports the random normal-distribution. did you test that? I'm very glad to get your feedback
Hi Annan; it’s been a while since I had this set up. I’d have to run the steps again to see the exact error message, but I recall it being pernicious until upgrading the PyTorch version, and relating to the random normal distribution function.
Perhaps you can replicate the steps with the current version and see if it still happens?
Hi, I guess it is because the torch.onnx.export() use the default version=9 for opeset_version in pytorch 1.10. You can manually set it with opset_version=11, then all things goes well. My current pytorch version is 1.10, it works well. I tested it today.
Back to the original topic, I think it would be a better choice that you only export onnx after checking it is a mature policy. And only export the self.player.model.a2cnetwork + self.player.model.running_mean_std part. These two parts are enough for inference.
Hi I created this examples how to export. I wasn't able to make it work with torch distributions so I created simple wrapper which calls normalization. You can take a look here: https://github.com/Denys88/rl_games#quickstart-colab-in-the-cloud (all links work in the google colab)
I am trying to do the same job as you and get the .pth from Isaac gym examples converted to .onnx by copying your script I got the error below, could you tell me what I am doing wrong to not get the converison function to work?
I don't know if the image will be displayed so I'll write the error directly to you in case you can tell me what I'm doing wrong : TypeError: forward() missing 1 required positional argument: 'input_dict'.
Sorry friend, I haven't seen that error before.
It would be nice if trained Isaac Gym models checkpointed to ONNX as well (for additional portability to game engines).
Here, I'll document the steps in my adventure so far in getting
.onnx
models out of the system:Update the PyTorch version to 1.12. I did this by updating
isaacgym
'sDockerfile
's base image tonvcr.io/nvidia/pytorch:22.04-py3
. Thankfully, it just works. This is necessary because 1.10's ONNX Exporter can't handle the random normal-distribution operation. If ONNX Export were to be made standard, this would need to be propagated to the properIsaac Gym
preview: https://developer.nvidia.com/isaac-gymAugment the
run.sh
command with another volume to allow for retrieval of the trained models (adding-v /home/gymuser/IsaacGymEnvs:/home/gymuser/IsaacGymEnvs
is sufficient; we're going to pull this repo into that folder in the next step).After starting the container, run:
At this point, we're going to want to make some changes to the
isaacgymenvs/learning/common_agent.py
(to write .onnx models with each checkpoint). I did this by attaching a VS Code instance with the Docker Extension, butnano
works as well. At the top underfrom torch import optim
, add:Underneath where it says
self.save(self.model_output_file + "_" + str(epoch_num))
, add:It would be nice if one of the authors could check my work here; I'm not sure if I have the names of the input and output tensors correct... @gavrielstate
After training for >50 epochs (via
python train.py task=HumanoidAMP
or somesuch), you should be seeing.onnx
checkpoints dumped alongside your PyTorch.pth
checkpoints. I've attached an example HumanoidAMP checkpoint ( HumanoidAMP_3455_ONNX.zip ), which can be inspected in https://netron.app/ (and hopefully run in Unity's Barracuda Evaluator; I haven't tested it yet).There's a strong chance I'm not properly accounting for inputs, persistent state, or the AMP actor critic properly... but I'm hoping my explorations here help lay the groundwork for more comprehensive ONNX support and portability across the Isaac Gym Ecosystem.
Thank you for your consideration.