rl-tools / rl-tools

A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
https://rl.tools
MIT License
142 stars 5 forks source link

Load compiled model error #4

Closed user-1701 closed 1 month ago

user-1701 commented 2 months ago

Hi, looks like a stunning library I am trying to run a project on windows, but I am receiving a compile error:

Command: cl /LD /LD /std:c++17 /O2 /arch:AVX2 /fp:fast -DTINYRL_USE_PYTHON_ENVIRONMENT -DTINYRL_OBSERVATION_DIM=35 -DTINYRL_ACTION_DIM=8 -DTINYRL_ENABLE_EVALUATION -DTINYRL_MODULE_NAME=tinyrl_sac -IC:/Users/Agent/AppData/Local/Temp/tinyrl/template/tinyrl_sac -DTINYRL_USE_LOOP_CORE_CONFIG -IC:/msvc/tinyrl/pybind11/include -IC:/PROGRA~1/Python/Python38/include -IC:/msvc/tinyrl/tinyrl/src/../external/rl_tools/include C:/msvc/tinyrl/tinyrl/src/../interface/training/training.cpp -o C:/Users/Agent/AppData/Local/Temp/tinyrl/build/tinyrl_sac/module.so OUTPUT training.cpp RL_TOOLS_COMMIT_HASH and RL_TOOLS_COMMIT_HASH_SHORT are not passed by the build system RLtools: Using Generic Backend C:\msvc\tinyrl\pybind11\include\pybind11\detail\internals.h(372): warning C4530: C++-Handler verwendet, aber Entladesemantik ist nicht aktiviert. Geben Sie /EHsc an. Microsoft (R) Incremental Linker Version 14.39.33523.0 Copyright (C) Microsoft Corporation. All rights reserved.

/out:training.dll /dll /implib:C:/Users/Agent/AppData/Local/Temp/tinyrl/build/tinyrl_sac/module.lib /out:C:/Users/Agent/AppData/Local/Temp/tinyrl/build/tinyrl_sac/module.so training.obj LINK : fatal error LNK1104: Datei "python38.lib" kann nicht ge”ffnet werden.

None


Strangely the python include seems to work. I included the lib file at multiple places (appended to includes, environment path variables, copied into include/rl_tools folder, ...) but without success. Btw. I had to change paths to contain no whitespaces for the includes, but here I don't know where to investigate further If you do not know a solution at shorthand then I can post more info about my configuration.

user-1701 commented 2 months ago

So to add: I am using VS2022 cl compiler. To get the compiler properly registered, I used the vcvarsall.bat and appended pycharm.exe there. It was highly recommended to not add cl manually to path variables. I then changed all includes in the compile.py to contain no whitespaces (reinstalled Visual Studio, used shortcodes to skip c:/program files, and installed tinyrl+pybind1 also outside of the venv (which contained a whitespace). So this is working now, only that the python38.lib is not found, which still IS in a path with a whitespace, however as I mentioned, I could not let it be recognized by adding a path variable or by copying it directly.

jonas-eschmann commented 2 months ago

Hi @user-1701!

The python interface is currently only tested under linux and macOS. RLtools itself (C++) is tested under Windows as well: https://github.com/rl-tools/rl-tools/blob/master/examples/windows/README.MD

If you run into an error, you can also check the github workflow: https://github.com/rl-tools/rl-tools/actions/runs/8755604175/job/24030113589 https://github.com/rl-tools/rl-tools/blob/a7e38f26a2524610c4b85746dde7f797a57d1906/.github/workflows/tests-minimal.yml#L90

You could also run it under WSL but from I would recommend to run it natively (with Intel MKL)

user-1701 commented 2 months ago

Thanks @jonas-eschmann

I don't know how the workflow could help, as I am running locally and I did not do any changes to the base code, only to the imports

But besides, I changed python installation to a another path and it works now! The model 'seems' to compile, but the module cannot be loaded afterwards

  File "C:\msvc\tinyrl\tinyrl\src\sac.py", line 97, in SAC
    module = load_module(module_name, output_path)
  File "C:\msvc\tinyrl\tinyrl\src\load_module.py", line 17, in load_module
    assert spec is not None

image

I think when compiling 'fresh', then this error occurs:

  File "C:\msvc\Python\Python38\lib\subprocess.py", line 1370, in _readerthread
    buffer.append(fh.read())
  File "C:\msvc\Python\Python38\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 62: character maps to <undefined>

Update:

import sys, locale, os
print(sys.stdout.encoding)              utf-8
print(sys.stdout.isatty())              False
print(locale.getpreferredencoding())    cp1252
print(sys.getfilesystemencoding())      utf-8
print(os.environ["PYTHONIOENCODING"])    UTF-8

My current encodings..

Update 2 This argument 'might' be responsible: image

image But maybe this argument is wrong?

jonas-eschmann commented 2 months ago

Hi, I modified the just in time compilation pipeline to also work under windows (but no MKL acceleration for now). You should be able to run it on Windows by following the instructions I added here. Make sure to use version 0.4.0. Let me know how it goes!

user-1701 commented 2 months ago

Hi @jonas-eschmann

(I just realized that I should have posted in the tinyrl repo, sorry for the mistake)

Just a note, whitespaces still are not supported, as shown here image

So unfortunately everything got worse. While at least it reached the 'Compilation finished' flag in the past, but only couldn't load the module, now I have to switch to the other compiler. image

So I installed mingw2 and vscode and all toolchains and stuff instead and I get another reference error: image

jonas-eschmann commented 2 months ago

No problem at all, this repo is fine!

Oh sorry I didn't encounter whitespaces when porting it to windows. I added quotes to all arguments (notice that the same issue might arise on unix-like oses as well). I tested PPO and SAC with whitespace paths and they seem to work now. TD3 doesn't compile for unrelated reasons I'm working on right now (0.4.2).

I didn't test it with mingw but I think MSVC is more standard, right?

I'm sorry for all the error messages, but to my defense the Python interface was mostly experimental until recently and hence not targeted at windows, yet. Let me know if it works now

user-1701 commented 2 months ago

no problem at all! don't worry about tinyrl being experimental, I am just very glad that it exists

I think cl.exe equals MSVC with my default VS2022 install. GCC = mingw/msys2 and I have never encountered these on windows.

I'll let you know how it goes

user-1701 commented 2 months ago

So the paths are repaired now and I am testing both compilers.

cl.exe error is back to usual^^ image So this looks like a typical encoding annoyance with windows. As it seems that 1252 encoding is requested. As I know, Python 3 does not have a default encoding anymore, so typically the encoding is simply specified when calling a .read() operation, (however trying fh.read(encoding="UTF-8") didn't work).

The g++ compiler error seems to have not changed. Not sure if the problems are related image

Haha, as quoted from here:

universal_newlines. This keyword basically says "just use whatever encoding is default on my system" (so basically UTF-8 on anything reasonably modern except on Windows, where you get some Cthulhu atrocity from the abyss the system's default code page).

Where this is suggested:

response = subprocess.check_output([...], universal_newlines=True)
response = subprocess.check_output([...], text=True)
response = subprocess.check_output([...], encoding='utf-8')
response = subprocess.check_output([...]).decode('utf-8')

hm.. could also be pybind11 encoding thing between string and unicode? https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html

user-1701 commented 1 month ago

not sure, if this occurs, or if this is because of me writing in the code image

image

I switched the locale preferred system format to utf-8, which resulted in:

File "C:\msvc\Python\Python38\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError:     (result, consumed) = self._buffer_decode(data, self.errors, final)'utf-8' codec can't decode byte 0x81 in position 62: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 2080: invalid start byte

I see that chcp 65001 solves for utf-8, so on this side I will lastly attempt to set my windows default to utf-8, which is a beta functionality, and restart real quick


no, I am completely on utf-8 now, but the error is the same

jonas-eschmann commented 1 month ago

I can only speculate because I can't reproduce these errors but did you try using the official Python installer (https://www.python.org/downloads/) and the latest version? On linux all of 3.7-3.12 work but maybe on windows there are some quirks.

Here is a video on how it looks on my setup https://youtu.be/I781CVBSMoU maybe that helps. I tried to capture the versions of the os, python and the library.

user-1701 commented 1 month ago

It works smoothly!! The last things were very much my bad, I am using a gym/gymnasium environment that I loaded as a function and not with make(), so the environment specs weren't available (at least max_episode_steps). I am so happy to start experimenting tomorrow! Thank you so much! Also for your effort. And the impressive, versatile and clean work.