mlech26l / ncps

PyTorch and TensorFlow implementation of NCP, LTC, and CfC wired neural models
https://www.nature.com/articles/s42256-020-00237-3
Apache License 2.0
1.86k stars 297 forks source link

Reproducibility issue: low rewards in Atari examples #48

Closed Annihillusion closed 1 year ago

Annihillusion commented 1 year ago

Thanks for your work and contribution. Yet we encountered reproduction inability when trying to replicate the results in Atari Behavior Cloning & Atari Reinforcement Learning (PPO) . In both case, we used the code exactly cloned from the repo.

In Atari Behavior Cloning task, the train loss steadily declines with the training procedure. However, the Mean Return(put the model in a real environment and run close loops) remains pretty low. After 50 epochs of training, the Mean Return is utterly the same as the initial model. We also visualize the behavior of the trained model and find it performs badly in Breakout. Here's part of the training log:

Details

(ncps) E:\ncps_experiment>python atari_torch.py A.L.E: Arcade Learning Environment (version 0.7.4+069f8bd) [Powered by Stella] C:\Users\Admin\anaconda3\envs\ncps\lib\site-packages\gym\utils\seeding.py:138: DeprecationWarning: WARN: Function `hash_seed(seed, max_bytes)` is marked as deprecated and will be removed in the future. deprecation( C:\Users\Admin\anaconda3\envs\ncps\lib\site-packages\gym\utils\seeding.py:175: DeprecationWarning: WARN: Function `_bigint_from_bytes(bytes)` is marked as deprecated and will be removed in the future. deprecation( 2023-05-11 02:08:34,419 WARNING deprecation.py:47 -- DeprecationWarning: `FrameStack` has been deprecated. This will raise an error in the future! loss=0.488: 100%| Epoch 1, val_loss=0.5465, val_acc=82.52% Mean return 1.8 (n=10) loss=0.331: 100%| Epoch 2, val_loss=0.8403, val_acc=67.58% Mean return 1.8 (n=10) loss=0.2709: 100%| Epoch 3, val_loss=2.126, val_acc=29.59% Mean return 0.5 (n=10) ...... loss=0.05224: 100%| Epoch 48, val_loss=0.831, val_acc=70.96% Mean return 1.4 (n=10) loss=0.04968: 100%| Epoch 49, val_loss=1.643, val_acc=56.48% Mean return 0.0 (n=10) loss=0.04885: 100%| Epoch 50, val_loss=2.886, val_acc=52.69% Mean return 0.6 (n=10)

The circumstance in Atari Reinforcement Learning (PPO) is almost the same. The policy reward just can't grow steadily as that showed in the tutorial. After 100k steps of sampling, the policy reward merely reached 5.0. Here's part of the training log:

Details

Ran 0.0 hours sampled 4k steps policy reward: 1.1 saved checkpoint 'rl_ckpt/ALE/Breakout-v5' Ran 0.5 hours sampled 164k steps policy reward: 1.6 saved checkpoint 'rl_ckpt/ALE/Breakout-v5' Ran 1.0 hours sampled 348k steps policy reward: 3.5 saved checkpoint 'rl_ckpt/ALE/Breakout-v5' Ran 1.5 hours sampled 540k steps policy reward: 5.4 saved checkpoint 'rl_ckpt/ALE/Breakout-v5' Ran 2.0 hours sampled 732k steps policy reward: 4.5 saved checkpoint 'rl_ckpt/ALE/Breakout-v5' Ran 2.5 hours sampled 916k steps policy reward: 5.3 saved checkpoint 'rl_ckpt/ALE/Breakout-v5' Ran 3.0 hours sampled 1108k steps policy reward: 4.5 saved checkpoint 'rl_ckpt/ALE/Breakout-v5'

We followed the tutorial and install the specified version of gym, ray and ale-py. We wonder if it's to do with the versions of other packages. Here's the conda environment we used in behavior cloning & reinforcement learning, respectively.

behavior cloning env

# packages in environment at C:\Users\Admin\anaconda3\envs\ncps: # # Name Version Build Channel _ipyw_jlab_nb_ext_conf 0.1.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main absl-py 1.4.0 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi alabaster 0.7.12 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main ale-py 0.7.4 pypi_0 pypi anaconda-client 1.11.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main anaconda-project 0.11.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main anyio 3.5.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main appdirs 1.4.4 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main argon2-cffi 21.3.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main argon2-cffi-bindings 21.2.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main arrow 1.2.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main astroid 2.11.7 py39haa95532_0 https://repo.anaconda.com/pkgs/main astropy 5.1 py39h080aedc_0 https://repo.anaconda.com/pkgs/main astunparse 1.6.3 pypi_0 pypi atomicwrites 1.4.0 py_0 https://repo.anaconda.com/pkgs/main attrs 21.4.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main automat 20.2.0 py_0 https://repo.anaconda.com/pkgs/main autopep8 1.6.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main autorom 0.4.2 pypi_0 pypi autorom-accept-rom-license 0.6.1 pypi_0 pypi babel 2.9.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main backcall 0.2.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main backports 1.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main backports.functools_lru_cache 1.6.4 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main backports.tempfile 1.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main backports.weakref 1.0.post1 py_1 https://repo.anaconda.com/pkgs/main bcrypt 3.2.0 py39h2bbff1b_1 https://repo.anaconda.com/pkgs/main beautifulsoup4 4.11.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main binaryornot 0.4.4 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main bitarray 2.5.1 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main bkcharts 0.2 py39haa95532_1 https://repo.anaconda.com/pkgs/main black 22.6.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main blas 1.0 mkl https://repo.anaconda.com/pkgs/main bleach 4.1.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main blosc 1.21.0 h19a0ad4_1 https://repo.anaconda.com/pkgs/main bokeh 2.4.3 py39haa95532_0 https://repo.anaconda.com/pkgs/main boto3 1.24.28 py39haa95532_0 https://repo.anaconda.com/pkgs/main botocore 1.27.28 py39haa95532_0 https://repo.anaconda.com/pkgs/main bottleneck 1.3.5 py39h080aedc_0 https://repo.anaconda.com/pkgs/main brotli 1.0.9 h2bbff1b_7 https://repo.anaconda.com/pkgs/main brotli-bin 1.0.9 h2bbff1b_7 https://repo.anaconda.com/pkgs/main brotlipy 0.7.0 py39h2bbff1b_1003 https://repo.anaconda.com/pkgs/main bzip2 1.0.8 he774522_0 https://repo.anaconda.com/pkgs/main ca-certificates 2023.01.10 haa95532_0 defaults cachetools 5.3.0 pypi_0 pypi certifi 2022.12.7 py39haa95532_0 defaults cffi 1.15.1 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main cfitsio 3.470 h2bbff1b_7 https://repo.anaconda.com/pkgs/main chardet 4.0.0 py39haa95532_1003 https://repo.anaconda.com/pkgs/main charls 2.2.0 h6c2663c_0 https://repo.anaconda.com/pkgs/main charset-normalizer 2.0.4 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main click 8.0.4 py39haa95532_0 https://repo.anaconda.com/pkgs/main cloudpickle 2.0.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main clyent 1.2.2 py39haa95532_1 https://repo.anaconda.com/pkgs/main colorama 0.4.5 py39haa95532_0 https://repo.anaconda.com/pkgs/main colorcet 3.0.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main comtypes 1.1.10 py39haa95532_1002 https://repo.anaconda.com/pkgs/main conda-content-trust 0.1.3 py39haa95532_0 https://repo.anaconda.com/pkgs/main conda-pack 0.6.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main conda-package-handling 1.9.0 py39h8cc25b3_0 https://repo.anaconda.com/pkgs/main conda-repo-cli 1.0.20 py39haa95532_0 https://repo.anaconda.com/pkgs/main conda-verify 3.4.2 py_1 https://repo.anaconda.com/pkgs/main constantly 15.1.0 pyh2b92418_0 https://repo.anaconda.com/pkgs/main cookiecutter 1.7.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main cryptography 37.0.1 py39h21b164f_0 https://repo.anaconda.com/pkgs/main cssselect 1.1.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main cuda-cccl 12.1.109 0 nvidia cuda-cudart 11.7.99 0 nvidia cuda-cudart-dev 11.7.99 0 nvidia cuda-cupti 11.7.101 0 nvidia cuda-libraries 11.7.1 0 nvidia cuda-libraries-dev 11.7.1 0 nvidia cuda-nvrtc 11.7.99 0 nvidia cuda-nvrtc-dev 11.7.99 0 nvidia cuda-nvtx 11.7.91 0 nvidia cuda-runtime 11.7.1 0 nvidia curl 7.84.0 h2bbff1b_0 https://repo.anaconda.com/pkgs/main cycler 0.11.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main cython 0.29.32 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main cytoolz 0.11.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main daal4py 2021.6.0 py39h757b272_1 https://repo.anaconda.com/pkgs/main dal 2021.6.0 h59b6b97_874 https://repo.anaconda.com/pkgs/main dask 2022.7.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main dask-core 2022.7.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main dataclasses 0.8 pyh6d0b6a4_7 https://repo.anaconda.com/pkgs/main datashader 0.14.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main datashape 0.5.4 py39haa95532_1 https://repo.anaconda.com/pkgs/main debugpy 1.5.1 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main decorator 5.1.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main defusedxml 0.7.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main diff-match-patch 20200713 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main dill 0.3.4 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main distlib 0.3.6 pypi_0 pypi distributed 2022.7.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main dm-tree 0.1.8 pypi_0 pypi docutils 0.18.1 py39haa95532_3 https://repo.anaconda.com/pkgs/main entrypoints 0.4 py39haa95532_0 https://repo.anaconda.com/pkgs/main et_xmlfile 1.1.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main fftw 3.3.9 h2bbff1b_1 https://repo.anaconda.com/pkgs/main filelock 3.12.0 pypi_0 pypi flake8 4.0.1 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main flask 1.1.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main flatbuffers 23.5.9 pypi_0 pypi fonttools 4.25.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main freetype 2.10.4 hd328e21_0 https://repo.anaconda.com/pkgs/main frozenlist 1.3.3 pypi_0 pypi fsspec 2022.7.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main future 0.18.2 py39haa95532_1 https://repo.anaconda.com/pkgs/main gast 0.4.0 pypi_0 pypi gensim 4.1.2 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main giflib 5.2.1 h62dcd97_0 https://repo.anaconda.com/pkgs/main glob2 0.7 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main google-auth 2.18.1 pypi_0 pypi google-auth-oauthlib 1.0.0 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi greenlet 1.1.1 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main grpcio 1.54.0 pypi_0 pypi gym 0.23.1 pypi_0 pypi gym-notices 0.0.8 pypi_0 pypi h5py 3.7.0 py39h3de5c98_0 https://repo.anaconda.com/pkgs/main hdf5 1.10.6 h1756f20_1 https://repo.anaconda.com/pkgs/main heapdict 1.0.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main holoviews 1.15.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main hvplot 0.8.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main hyperlink 21.0.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main icc_rt 2022.1.0 h6049295_2 https://repo.anaconda.com/pkgs/main icu 58.2 ha925a31_3 https://repo.anaconda.com/pkgs/main idna 3.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main imagecodecs 2021.8.26 py39hc0a7faf_1 https://repo.anaconda.com/pkgs/main imageio 2.19.3 py39haa95532_0 https://repo.anaconda.com/pkgs/main imagesize 1.4.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main importlib-metadata 4.11.3 py39haa95532_0 https://repo.anaconda.com/pkgs/main importlib-resources 5.12.0 pypi_0 pypi importlib_metadata 4.11.3 hd3eb1b0_0 https://repo.anaconda.com/pkgs/main incremental 21.3.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main inflection 0.5.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main iniconfig 1.1.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main intake 0.6.5 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main intel-openmp 2021.4.0 haa95532_3556 https://repo.anaconda.com/pkgs/main intervaltree 3.1.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main ipykernel 6.15.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main ipython 7.31.1 py39haa95532_1 https://repo.anaconda.com/pkgs/main ipython_genutils 0.2.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main ipywidgets 7.6.5 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main isort 5.9.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main itemadapter 0.3.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main itemloaders 1.0.4 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main itsdangerous 2.0.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main jax 0.4.10 pypi_0 pypi jdcal 1.4.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main jedi 0.18.1 py39haa95532_1 https://repo.anaconda.com/pkgs/main jellyfish 0.9.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main jinja2 2.11.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main jinja2-time 0.2.0 pyhd3eb1b0_3 https://repo.anaconda.com/pkgs/main jmespath 0.10.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main joblib 1.1.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main jpeg 9e h2bbff1b_0 https://repo.anaconda.com/pkgs/main jq 1.6 haa95532_1 https://repo.anaconda.com/pkgs/main json5 0.9.6 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main jsonschema 4.16.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main jupyter 1.0.0 py39haa95532_8 https://repo.anaconda.com/pkgs/main jupyter_client 7.3.4 py39haa95532_0 https://repo.anaconda.com/pkgs/main jupyter_console 6.4.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main jupyter_core 4.11.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main jupyter_server 1.18.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main jupyterlab 3.4.4 py39haa95532_0 https://repo.anaconda.com/pkgs/main jupyterlab_pygments 0.1.2 py_0 https://repo.anaconda.com/pkgs/main jupyterlab_server 2.10.3 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main jupyterlab_widgets 1.0.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main keras 2.12.0 pypi_0 pypi keyring 23.4.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main kiwisolver 1.4.2 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main lazy-object-proxy 1.6.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main lcms2 2.12 h83e58a3_0 https://repo.anaconda.com/pkgs/main lerc 3.0 hd77b12b_0 https://repo.anaconda.com/pkgs/main libaec 1.0.4 h33f27b4_1 https://repo.anaconda.com/pkgs/main libarchive 3.6.1 hebabd0d_0 https://repo.anaconda.com/pkgs/main libbrotlicommon 1.0.9 h2bbff1b_7 https://repo.anaconda.com/pkgs/main libbrotlidec 1.0.9 h2bbff1b_7 https://repo.anaconda.com/pkgs/main libbrotlienc 1.0.9 h2bbff1b_7 https://repo.anaconda.com/pkgs/main libclang 16.0.0 pypi_0 pypi libcublas 11.10.3.66 0 nvidia libcublas-dev 11.10.3.66 0 nvidia libcufft 10.7.2.124 0 nvidia libcufft-dev 10.7.2.124 0 nvidia libcurand 10.3.2.106 0 nvidia libcurand-dev 10.3.2.106 0 nvidia libcurl 7.84.0 h86230a5_0 https://repo.anaconda.com/pkgs/main libcusolver 11.4.0.1 0 nvidia libcusolver-dev 11.4.0.1 0 nvidia libcusparse 11.7.4.91 0 nvidia libcusparse-dev 11.7.4.91 0 nvidia libdeflate 1.8 h2bbff1b_5 https://repo.anaconda.com/pkgs/main libiconv 1.16 h2bbff1b_2 https://repo.anaconda.com/pkgs/main liblief 0.11.5 hd77b12b_1 https://repo.anaconda.com/pkgs/main libnpp 11.7.4.75 0 nvidia libnpp-dev 11.7.4.75 0 nvidia libnvjpeg 11.8.0.2 0 nvidia libnvjpeg-dev 11.8.0.2 0 nvidia libpng 1.6.37 h2a8f88b_0 https://repo.anaconda.com/pkgs/main libsodium 1.0.18 h62dcd97_0 https://repo.anaconda.com/pkgs/main libspatialindex 1.9.3 h6c2663c_0 https://repo.anaconda.com/pkgs/main libssh2 1.10.0 hcd4344a_0 https://repo.anaconda.com/pkgs/main libtiff 4.4.0 h8a3f274_0 https://repo.anaconda.com/pkgs/main libuv 1.44.2 h2bbff1b_0 defaults libwebp 1.2.2 h2bbff1b_0 https://repo.anaconda.com/pkgs/main libxml2 2.9.14 h0ad7f3c_0 https://repo.anaconda.com/pkgs/main libxslt 1.1.35 h2bbff1b_0 https://repo.anaconda.com/pkgs/main libzopfli 1.0.3 ha925a31_0 https://repo.anaconda.com/pkgs/main llvmlite 0.38.0 py39h23ce68f_0 https://repo.anaconda.com/pkgs/main locket 1.0.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main lxml 4.9.1 py39h1985fb9_0 https://repo.anaconda.com/pkgs/main lz4 3.1.3 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main lz4-c 1.9.3 h2bbff1b_1 https://repo.anaconda.com/pkgs/main lzo 2.10 he774522_2 https://repo.anaconda.com/pkgs/main m2-msys2-runtime 2.5.0.17080.65c939c 3 https://repo.anaconda.com/pkgs/msys2 m2-patch 2.7.5 2 https://repo.anaconda.com/pkgs/msys2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 https://repo.anaconda.com/pkgs/msys2 markdown 3.3.4 py39haa95532_0 https://repo.anaconda.com/pkgs/main markupsafe 2.0.1 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main matplotlib 3.5.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main matplotlib-base 3.5.2 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main matplotlib-inline 0.1.6 py39haa95532_0 https://repo.anaconda.com/pkgs/main mccabe 0.6.1 py39haa95532_2 https://repo.anaconda.com/pkgs/main menuinst 1.4.19 py39h59b6b97_0 https://repo.anaconda.com/pkgs/main mistune 0.8.4 py39h2bbff1b_1000 https://repo.anaconda.com/pkgs/main mkl 2021.4.0 haa95532_640 https://repo.anaconda.com/pkgs/main mkl-service 2.4.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main mkl_fft 1.3.1 py39h277e83a_0 https://repo.anaconda.com/pkgs/main mkl_random 1.2.2 py39hf11a4ad_0 https://repo.anaconda.com/pkgs/main ml-dtypes 0.1.0 pypi_0 pypi mock 4.0.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main mpmath 1.2.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main msgpack-python 1.0.3 py39h59b6b97_0 https://repo.anaconda.com/pkgs/main msys2-conda-epoch 20160418 1 https://repo.anaconda.com/pkgs/msys2 multipledispatch 0.6.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main munkres 1.1.4 py_0 https://repo.anaconda.com/pkgs/main mypy_extensions 0.4.3 py39haa95532_1 https://repo.anaconda.com/pkgs/main nbclassic 0.3.5 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main nbclient 0.5.13 py39haa95532_0 https://repo.anaconda.com/pkgs/main nbconvert 6.4.4 py39haa95532_0 https://repo.anaconda.com/pkgs/main nbformat 5.5.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main ncps 0.0.7 pypi_0 pypi nest-asyncio 1.5.5 py39haa95532_0 https://repo.anaconda.com/pkgs/main networkx 2.8.4 py39haa95532_0 https://repo.anaconda.com/pkgs/main nltk 3.7 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main nose 1.3.7 pyhd3eb1b0_1008 https://repo.anaconda.com/pkgs/main notebook 6.4.12 py39haa95532_0 https://repo.anaconda.com/pkgs/main numba 0.55.1 py39hf11a4ad_0 https://repo.anaconda.com/pkgs/main numexpr 2.8.3 py39hb80d3ca_0 https://repo.anaconda.com/pkgs/main numpy 1.22.0 pypi_0 pypi numpydoc 1.4.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main oauthlib 3.2.2 pypi_0 pypi olefile 0.46 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main openjpeg 2.4.0 h4fc8c34_0 https://repo.anaconda.com/pkgs/main openpyxl 3.0.10 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main openssl 1.1.1t h2bbff1b_0 defaults opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pandas 1.4.4 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main pandocfilters 1.5.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main panel 0.13.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main param 1.12.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main paramiko 2.8.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main parsel 1.6.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main parso 0.8.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main partd 1.2.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main pathlib 1.0.1 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main pathspec 0.9.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main patsy 0.5.2 py39haa95532_1 https://repo.anaconda.com/pkgs/main pep8 1.7.1 py39haa95532_1 https://repo.anaconda.com/pkgs/main pexpect 4.8.0 pyhd3eb1b0_3 https://repo.anaconda.com/pkgs/main pickleshare 0.7.5 pyhd3eb1b0_1003 https://repo.anaconda.com/pkgs/main pillow 9.2.0 py39hdc2b20a_1 https://repo.anaconda.com/pkgs/main pip 22.2.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main pkginfo 1.8.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main platformdirs 3.5.0 pypi_0 pypi plotly 5.9.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main pluggy 1.0.0 py39haa95532_1 https://repo.anaconda.com/pkgs/main poyo 0.5.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main prometheus_client 0.14.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main prompt-toolkit 3.0.20 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main prompt_toolkit 3.0.20 hd3eb1b0_0 https://repo.anaconda.com/pkgs/main protego 0.1.16 py_0 https://repo.anaconda.com/pkgs/main protobuf 3.20.3 pypi_0 pypi psutil 5.9.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main ptyprocess 0.7.0 pyhd3eb1b0_2 https://repo.anaconda.com/pkgs/main py 1.11.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main py-lief 0.11.5 py39hd77b12b_1 https://repo.anaconda.com/pkgs/main pyasn1 0.4.8 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pyasn1-modules 0.2.8 py_0 https://repo.anaconda.com/pkgs/main pycodestyle 2.8.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pycosat 0.6.3 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main pycparser 2.21 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pyct 0.4.8 py39haa95532_1 https://repo.anaconda.com/pkgs/main pycurl 7.45.1 py39hcd4344a_0 https://repo.anaconda.com/pkgs/main pydispatcher 2.0.5 py39haa95532_2 https://repo.anaconda.com/pkgs/main pydocstyle 6.1.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pyerfa 2.0.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main pyflakes 2.4.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pygments 2.11.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pyhamcrest 2.0.2 pyhd3eb1b0_2 https://repo.anaconda.com/pkgs/main pyjwt 2.4.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main pylint 2.14.5 py39haa95532_0 https://repo.anaconda.com/pkgs/main pyls-spyder 0.4.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pynacl 1.5.0 py39h8cc25b3_0 https://repo.anaconda.com/pkgs/main pyodbc 4.0.34 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main pyopenssl 22.0.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pyparsing 3.0.9 py39haa95532_0 https://repo.anaconda.com/pkgs/main pyqt 5.9.2 py39hd77b12b_6 https://repo.anaconda.com/pkgs/main pyrsistent 0.18.0 py39h196d8e1_0 https://repo.anaconda.com/pkgs/main pysocks 1.7.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main pytables 3.6.1 py39h56d22b6_1 https://repo.anaconda.com/pkgs/main pytest 7.1.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main python 3.9.13 h6244533_1 https://repo.anaconda.com/pkgs/main python-dateutil 2.8.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main python-fastjsonschema 2.16.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main python-libarchive-c 2.9 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main python-lsp-black 1.0.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main python-lsp-jsonrpc 1.0.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main python-lsp-server 1.3.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main python-slugify 5.0.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main python-snappy 0.6.0 py39hd77b12b_3 https://repo.anaconda.com/pkgs/main python-version 0.0.2 pypi_0 pypi pytorch 2.0.0 py3.9_cuda11.7_cudnn8_0 pytorch pytorch-cuda 11.7 h16d0643_3 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2022.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main pyviz_comms 2.0.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main pywavelets 1.3.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main pywin32 302 py39h2bbff1b_2 https://repo.anaconda.com/pkgs/main pywin32-ctypes 0.2.0 py39haa95532_1000 https://repo.anaconda.com/pkgs/main pywinpty 2.0.2 py39h5da7b33_0 https://repo.anaconda.com/pkgs/main pyyaml 6.0 py39h2bbff1b_1 https://repo.anaconda.com/pkgs/main pyzmq 23.2.0 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main qdarkstyle 3.0.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main qstylizer 0.1.10 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main qt 5.9.7 vc14h73c81de_0 https://repo.anaconda.com/pkgs/main qtawesome 1.0.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main qtconsole 5.2.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main qtpy 2.2.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main queuelib 1.5.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main ray 2.1.0 pypi_0 pypi regex 2022.7.9 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main requests 2.28.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main requests-file 1.5.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main requests-oauthlib 1.3.1 pypi_0 pypi rope 0.22.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main rsa 4.9 pypi_0 pypi rtree 0.9.7 py39h2eaa2aa_1 https://repo.anaconda.com/pkgs/main ruamel.yaml 0.17.21 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main ruamel.yaml.clib 0.2.6 py39h2bbff1b_1 https://repo.anaconda.com/pkgs/main ruamel_yaml 0.15.100 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main s3transfer 0.6.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main scikit-image 0.19.2 py39hf11a4ad_0 https://repo.anaconda.com/pkgs/main scikit-learn 1.0.2 py39hf11a4ad_1 https://repo.anaconda.com/pkgs/main scikit-learn-intelex 2021.6.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main scipy 1.9.1 py39he11b74f_0 https://repo.anaconda.com/pkgs/main scrapy 2.6.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main seaborn 0.11.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main send2trash 1.8.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main service_identity 18.1.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main setuptools 63.4.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main sip 4.19.13 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main six 1.16.0 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main smart_open 5.2.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main snappy 1.1.9 h6c2663c_0 https://repo.anaconda.com/pkgs/main sniffio 1.2.0 py39haa95532_1 https://repo.anaconda.com/pkgs/main snowballstemmer 2.2.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sortedcollections 2.1.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sortedcontainers 2.4.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main soupsieve 2.3.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sphinx 5.0.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main sphinxcontrib-applehelp 1.0.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sphinxcontrib-devhelp 1.0.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sphinxcontrib-htmlhelp 2.0.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sphinxcontrib-jsmath 1.0.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sphinxcontrib-qthelp 1.0.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main sphinxcontrib-serializinghtml 1.1.5 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main spyder 5.2.2 py39haa95532_1 https://repo.anaconda.com/pkgs/main spyder-kernels 2.2.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main sqlalchemy 1.4.39 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main sqlite 3.39.3 h2bbff1b_0 https://repo.anaconda.com/pkgs/main statsmodels 0.13.2 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main sympy 1.10.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main tabulate 0.8.10 py39haa95532_0 https://repo.anaconda.com/pkgs/main tbb 2021.6.0 h59b6b97_0 https://repo.anaconda.com/pkgs/main tbb4py 2021.6.0 py39h59b6b97_0 https://repo.anaconda.com/pkgs/main tblib 1.7.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main tenacity 8.0.1 py39haa95532_1 https://repo.anaconda.com/pkgs/main tensorboard 2.12.3 pypi_0 pypi tensorboard-data-server 0.7.0 pypi_0 pypi tensorboardx 2.6 pypi_0 pypi tensorflow-estimator 2.12.0 pypi_0 pypi tensorflow-intel 2.12.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.31.0 pypi_0 pypi termcolor 2.3.0 pypi_0 pypi terminado 0.13.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main testpath 0.6.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main text-unidecode 1.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main textdistance 4.2.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main threadpoolctl 2.2.0 pyh0d69192_0 https://repo.anaconda.com/pkgs/main three-merge 0.1.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main tifffile 2021.7.2 pyhd3eb1b0_2 https://repo.anaconda.com/pkgs/main tinycss 0.4 pyhd3eb1b0_1002 https://repo.anaconda.com/pkgs/main tk 8.6.12 h2bbff1b_0 https://repo.anaconda.com/pkgs/main tldextract 3.2.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main toml 0.10.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main tomli 2.0.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main tomlkit 0.11.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main toolz 0.11.2 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main torchvision 0.15.0 pypi_0 pypi tornado 6.1 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main tqdm 4.64.1 py39haa95532_0 https://repo.anaconda.com/pkgs/main traitlets 5.1.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main twisted 22.2.0 py39h2bbff1b_1 https://repo.anaconda.com/pkgs/main twisted-iocpsupport 1.0.2 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main typing-extensions 4.3.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main typing_extensions 4.3.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main tzdata 2022c h04d1e81_0 https://repo.anaconda.com/pkgs/main ujson 5.4.0 py39hd77b12b_0 https://repo.anaconda.com/pkgs/main unidecode 1.2.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main urllib3 1.26.11 py39haa95532_0 https://repo.anaconda.com/pkgs/main vc 14.2 h21ff451_1 https://repo.anaconda.com/pkgs/main virtualenv 20.23.0 pypi_0 pypi vs2015_runtime 14.27.29016 h5e58377_2 https://repo.anaconda.com/pkgs/main w3lib 1.21.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main watchdog 2.1.6 py39haa95532_0 https://repo.anaconda.com/pkgs/main wcwidth 0.2.5 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main webencodings 0.5.1 py39haa95532_1 https://repo.anaconda.com/pkgs/main websocket-client 0.58.0 py39haa95532_4 https://repo.anaconda.com/pkgs/main werkzeug 2.0.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main wheel 0.37.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main widgetsnbextension 3.5.2 py39haa95532_0 https://repo.anaconda.com/pkgs/main win_inet_pton 1.1.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main win_unicode_console 0.5 py39haa95532_0 https://repo.anaconda.com/pkgs/main wincertstore 0.2 py39haa95532_2 https://repo.anaconda.com/pkgs/main winpty 0.4.3 4 https://repo.anaconda.com/pkgs/main wrapt 1.14.1 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main xarray 0.20.1 pyhd3eb1b0_1 https://repo.anaconda.com/pkgs/main xlrd 2.0.1 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main xlsxwriter 3.0.3 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main xlwings 0.27.15 py39haa95532_0 https://repo.anaconda.com/pkgs/main xz 5.2.6 h8cc25b3_0 https://repo.anaconda.com/pkgs/main yaml 0.2.5 he774522_0 https://repo.anaconda.com/pkgs/main yapf 0.31.0 pyhd3eb1b0_0 https://repo.anaconda.com/pkgs/main zeromq 4.3.4 hd77b12b_0 https://repo.anaconda.com/pkgs/main zfp 0.5.5 hd77b12b_6 https://repo.anaconda.com/pkgs/main zict 2.1.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main zipp 3.8.0 py39haa95532_0 https://repo.anaconda.com/pkgs/main zlib 1.2.12 h8cc25b3_3 https://repo.anaconda.com/pkgs/main zope 1.0 py39haa95532_1 https://repo.anaconda.com/pkgs/main zope.interface 5.4.0 py39h2bbff1b_0 https://repo.anaconda.com/pkgs/main zstd 1.5.2 h19a0ad4_0 https://repo.anaconda.com/pkgs/main

reinforcement learning env

# packages in environment at C:\Users\Admin\anaconda3\envs\tf2: # # Name Version Build Channel absl-py 1.4.0 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi ale-py 0.7.4 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi attrs 23.1.0 pypi_0 pypi autorom 0.4.2 pypi_0 pypi autorom-accept-rom-license 0.6.1 pypi_0 pypi blas 1.0 mkl defaults ca-certificates 2023.01.10 haa95532_0 defaults cachetools 5.3.0 pypi_0 pypi certifi 2023.5.7 py39haa95532_0 defaults charset-normalizer 3.1.0 pypi_0 pypi click 8.0.4 pypi_0 pypi cloudpickle 2.2.1 pypi_0 pypi colorama 0.4.6 pypi_0 pypi contourpy 1.0.7 pypi_0 pypi cycler 0.11.0 pypi_0 pypi distlib 0.3.6 pypi_0 pypi dm-tree 0.1.8 pypi_0 pypi filelock 3.12.0 pypi_0 pypi flatbuffers 23.5.9 pypi_0 pypi fonttools 4.39.4 pypi_0 pypi frozenlist 1.3.3 pypi_0 pypi future 0.18.3 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.18.1 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.54.2 pypi_0 pypi gym 0.23.1 pypi_0 pypi gym-notices 0.0.8 pypi_0 pypi h5py 3.8.0 pypi_0 pypi idna 3.4 pypi_0 pypi imageio 2.29.0 pypi_0 pypi importlib-metadata 6.6.0 pypi_0 pypi importlib-resources 5.12.0 pypi_0 pypi intel-openmp 2023.1.0 h59b6b97_46319 defaults jsonschema 4.17.3 pypi_0 pypi keras 2.10.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi lazy-loader 0.2 pypi_0 pypi libclang 16.0.0 pypi_0 pypi libffi 3.4.4 hd77b12b_0 defaults lz4 4.3.2 pypi_0 pypi markdown 3.4.3 pypi_0 pypi markupsafe 2.1.2 pypi_0 pypi matplotlib 3.7.1 pypi_0 pypi mkl 2023.1.0 h8bd8f75_46356 defaults mkl-service 2.4.0 py39h2bbff1b_1 defaults mkl_fft 1.3.6 py39hf11a4ad_1 defaults mkl_random 1.2.2 py39hf11a4ad_1 defaults msgpack 1.0.5 pypi_0 pypi ncps 0.0.7 pypi_0 pypi networkx 3.1 pypi_0 pypi numpy 1.24.3 pypi_0 pypi numpy-base 1.23.5 py39h46c4fa8_1 defaults oauthlib 3.2.2 pypi_0 pypi openssl 1.1.1t h2bbff1b_0 defaults opt-einsum 3.3.0 pypi_0 pypi packaging 23.1 pypi_0 pypi pandas 2.0.1 pypi_0 pypi pillow 9.5.0 pypi_0 pypi pip 23.0.1 py39haa95532_0 defaults pkgutil-resolve-name 1.3.10 pypi_0 pypi platformdirs 3.5.1 pypi_0 pypi protobuf 3.19.6 pypi_0 pypi pyasn1 0.5.0 pypi_0 pypi pyasn1-modules 0.3.0 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi python 3.9.16 h6244533_2 defaults python-dateutil 2.8.2 pypi_0 pypi pytz 2023.3 pypi_0 pypi pywavelets 1.4.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi ray 2.1.0 pypi_0 pypi requests 2.31.0 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-image 0.20.0 pypi_0 pypi scipy 1.9.1 pypi_0 pypi setuptools 66.0.0 py39haa95532_0 defaults six 1.16.0 pypi_0 pypi sqlite 3.41.2 h2bbff1b_0 defaults tabulate 0.9.0 pypi_0 pypi tbb 2021.8.0 h59b6b97_0 defaults tensorboard 2.10.1 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorboardx 2.6 pypi_0 pypi tensorflow 2.10.0 pypi_0 pypi tensorflow-estimator 2.10.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.31.0 pypi_0 pypi termcolor 2.3.0 pypi_0 pypi tifffile 2023.4.12 pypi_0 pypi tqdm 4.65.0 pypi_0 pypi typing-extensions 4.6.1 pypi_0 pypi tzdata 2023.3 pypi_0 pypi urllib3 1.26.16 pypi_0 pypi vc 14.2 h21ff451_1 defaults virtualenv 20.23.0 pypi_0 pypi vs2015_runtime 14.27.29016 h5e58377_2 defaults werkzeug 2.3.4 pypi_0 pypi wheel 0.38.4 py39haa95532_0 defaults wrapt 1.15.0 pypi_0 pypi zipp 3.15.0 pypi_0 pypi

mlech26l commented 1 year ago

I have updated the Conv layers in the examples (by adding batch norms between the conv layers as suggested here). I get a reward > 40 after a couple of minutes of training (with pytorch behavior cloning) The RL training probably benefits from these changes as well (+ RL needs to train for 24h or so to collect enough experience)

loss=0.3713: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:43<00:00,  9.04it/s]
Epoch 1, val_loss=0.2822, val_acc=89.86%
Mean return 5.7 (n=10)
loss=0.2131: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.14it/s]
Epoch 2, val_loss=0.215, val_acc=92.41%
Mean return 20.1 (n=10)
loss=0.1761: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.18it/s]
Epoch 3, val_loss=0.1937, val_acc=93.19%
Mean return 71.8 (n=10)
loss=0.1563: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.17it/s]
Epoch 4, val_loss=0.1684, val_acc=94.15%
Mean return 60.2 (n=10)
loss=0.1426: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.17it/s]
Epoch 5, val_loss=0.154, val_acc=94.53%
Mean return 31.6 (n=10)
loss=0.1311: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.13it/s]
Epoch 6, val_loss=0.1506, val_acc=94.63%
Mean return 74.9 (n=10)
loss=0.1213: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.13it/s]
Epoch 7, val_loss=0.1624, val_acc=94.10%
Mean return 65.2 (n=10)
loss=0.1135: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.14it/s]
Epoch 8, val_loss=0.1443, val_acc=94.89%
Mean return 64.8 (n=10)
loss=0.1055: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.14it/s]
Epoch 9, val_loss=0.1376, val_acc=95.14%
Mean return 64.1 (n=10)
loss=0.09882: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.13it/s]
Epoch 10, val_loss=0.1359, val_acc=95.26%
Mean return 69.6 (n=10)
loss=0.09216: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.16it/s]
Epoch 11, val_loss=0.1377, val_acc=95.25%
Mean return 77.5 (n=10)
loss=0.08617: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.12it/s]
Annihillusion commented 1 year ago

Many thanks for your reply!But my result remains the same using the new code with BN. I suppose there're some version issues in my Python environment. Could you kindly provide your conda environment info and the model's parameters (pytorch behavior cloning, several epochs will suffice). It would be really helpful for my reproduction work.

lungd commented 1 year ago

I had the same problem and could solve it by removing the division by 255 (https://github.com/mlech26l/ncps/blob/master/examples/atari_torch.py#L122) Seems like the observation already get scaled inside some function of one of the dependencies.

Annihillusion commented 1 year ago

Indeed, I check the observation given by the env and find it already standardized to [0, 1), which do not conform with the description of Gym's documentation([0, 255]). However, this seems only to happen in Python 3.10. Removing observation's division by 255 can perfectly solve the problem (for Python version 3.10). Thanks for your solution!