Closed MarWaltz closed 3 years ago
which version of torch and os platform are you using?
torch version is 1.8.1+cu111 and my os is Windows 10, but I already tried several different torch versions.
Yeah, windows torch does not support rpc_sync
and any distributed model that is using this function (IMPALA, A3C, etc).
So far I don't have a windows platform to test so there might be some import errors. Could you please show the detailed error stack in python?
Of course, see below:
Traceback (most recent call last):
File "c:
Oh, that error is easy to fix, for now as a temporary fix you need to do the following changes: In file https://github.com/iffiX/machin/blob/master/machin/parallel/__init__.py (C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel__init__.py on your local system)
from . import distributed
"distributed"
from __all__
The wrapper you are using does not depend on rpc functions.Please notify me if any other import errors persist.
I did make these changes, but unfortunately I still run into the following:
Traceback (most recent call last):
File "c:\
Oh I forgot the "server", you also need to remove that. Sorry for this inconvenience.
No worries. But still:
Traceback (most recent call last):
File "c:\
OK for these errors you need to change the ImportError
to Exception
in these two files:
https://github.com/iffiX/machin/blob/master/machin/frame/algorithms/__init__.py
https://github.com/iffiX/machin/blob/master/machin/frame/buffers/__init__.py
Because AttributeError is not captured here.
Okay thanks, I will have a look into it and come back to you tomorrow.
No problem, I will correct these problem in my code now, and try to find a windows testing environment.
Hello again, see below:
Traceback (most recent call last):
File "c:..\Desktop\Forschung\RL\Implementations\PyTorch Templates\machin\CartPole-DQN.py", line 1, in
OK, now move from .a3c import A3C
to that try except block:
https://github.com/iffiX/machin/blob/baa093d85cfc578815e0adc85084f14abdbbd87d/machin/frame/algorithms/__init__.py#L23
like this:
try:
from .a3c import A3C
from .apex import DQNApex, DDPGApex
from .impala import IMPALA
from .ars import ARS
except Exception as _:
warnings.warn(
"Failed to import algorithms relying on torch.distributed." " Set them to None."
)
A3C = None
DQNApex = None
DDPGApex = None
IMPALA = None
ARS = None
Great job, this example works fine now! I will close this issue and open a new one if any further problems should occur.
Thanks again.
OK, during this time I will add a quick fix to this when I got circleci working. :)
After searching for a while I cannot find a platform with reasonable time for my auto testing, and since it is too difficult to maintain a hybrid jenkins-windows-vm setup I will not consider windows CI in the near future.
As a complement, I will do a one-time testing manually for requested future versions.
can help me that below: @rpc.functions.async_execution AttributeError: module 'torch.distributed.rpc' has no attribute 'functions'
Hello, when I am trying to run a tutorial script, e.g. the your_first_program example, I always encounter this AttributeError during the imports:
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'
However, I fulfill the listed requirements. Is there anything I am missing or have can I solve this?