sebastianstarke / AI4Animation

Bringing Characters to Life with Computer Brains in Unity
7.4k stars 1.03k forks source link

Error when training GNN - Incorrect torch version? #110

Closed polycrunchgames closed 1 year ago

polycrunchgames commented 1 year ago

I was following this video: https://youtu.be/3ASGrxNDd0k to train the quadruped. When doing the final step for GNN training, when it completes at 100%, I get this error:

Loading Data/OutputShape.txt
Loading Data/InputNormalization.txt
Loading Data/OutputNormalization.txt
Traceback (most recent call last):
  File "C:\Users\Neville\Downloads\AI4Animation-master\AI4Animation\SIGGRAPH_2022\PyTorch\GNN\Network.py", line 173, in <module>
    utility.SaveONNX(
  File "C:\Users\Neville\Downloads\AI4Animation-master\AI4Animation\SIGGRAPH_2022\PyTorch\GNN\..\Library\Utility.py", line 263, in SaveONNX
    torch.onnx.export(
  File "C:\Users\Neville\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\onnx\utils.py", line 506, in export
    _export(
  File "C:\Users\Neville\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\onnx\utils.py", line 1525, in _export
    with exporter_context(model, training, verbose):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "C:\Users\Neville\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\onnx\utils.py", line 178, in exporter_context
    with select_model_mode_for_export(
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "C:\Users\Neville\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\onnx\utils.py", line 90, in select_model_mode_for_export
    raise TypeError(
TypeError: 'mode' should be a torch.onnx.TrainingMode enum, but got '<class 'bool'>'.

When I did the first python call, I had to install it (by just typing python in command prompt which opens up Microsoft Store) and then do the following to install any missing modules: pip3 install numpy pip3 install torch pip3 install matplotlib pip3 install sklearn //the above line didn’t really work as it kept telling me sklearn module not found when trying to run Network.py, but the next line fixed it pip3 install -U scikit-learn scipy matplotlib

Do I need specific version of torch? Here is the version of torch when doing pip3 show torch

Name: torch
Version: 2.0.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
Required-by:

Here are the version numbers for torch spewed when trying to re-install it:

Requirement already satisfied: torch in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (2.0.0)
Requirement already satisfied: filelock in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (from torch) (3.10.7)
Requirement already satisfied: jinja2 in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (from torch) (3.1.2)
Requirement already satisfied: sympy in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (from torch) (1.11.1)
Requirement already satisfied: typing-extensions in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (from torch) (4.5.0)
Requirement already satisfied: networkx in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (from torch) (3.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (from jinja2->torch) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in c:\users\neville\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages (from sympy->torch) (1.3.0)
polycrunchgames commented 1 year ago

The solution was to use an older version of torch (1.12.1 worked) pip3 install torch==1.12.1

Then I noticed that in the video, when the author selected the GNN file in Unity, it display the torch version which was using 1.9.

3 days later (150 Epochs) the GNN was trained successfully. Apparently I didn't need 150 Epochs, since the author used the results of the 5th Epoch.