salesforce / warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
BSD 3-Clause "New" or "Revised" License
465 stars 78 forks source link

ERROR:root:module 'warp_drive.numba_includes.env_runner' has no attribute 'NumbaCustomEnvStep' #71

Closed Finebouche closed 1 year ago

Finebouche commented 1 year ago

Hi thank your help setting up this,

Following the same tutorial and this examples I have set up the cpu env in custom_env.py where CustomEnv class is and the numba env in custom_env_step_numba.py where NumbaCustomEnvStep function is.

The environment I register is the first custom_en.py this way :

env_registrar.add_cuda_env_src_path(CustomEnv.name, "custom_env", env_backend="numba")

So I have been able to charge my cpu environment without any problem.

env_wrapper = EnvWrapper(
    env_obj=CustomEnv(**run_config["env"]),
    env_name=CustomEnv.name,
    num_envs=run_config["trainer"]["num_envs"],
    env_backend="cpu",
    env_registrar=env_registrar
)

However when I try to set env_backend="numba" I get the following error ERROR:root:module 'warp_drive.numba_includes.env_runner' has no attribute 'NumbaCustomEnvStep'

Not sure where is this error coming from. Warp_drive should have find the NumbaCustomEnvStep class in custom_env_step_numba.py but it obviously did not.

What am I missing again ?

Finebouche commented 1 year ago

I should precise that this error message is prompted when self.env.initialize_step_function_context is called in env_wrapper.py

Emerald01 commented 1 year ago

You can simply test from <<YOUR_ENV_NUMBA_LIB>> import * and see if your NumbaCustomEnvStep can be identified successfully as a Python module in your running settings. And this <<YOUR_ENV_NUMBA_LIB>> is what you shall use here env_registrar.add_cuda_env_src_path(CustomEnvironment.name, "<<YOUR_ENV_NUMBA_LIB>>", env_backend="numba")

There is no magic here, here we just secretly integrate your numba module into the entire ecosystem, and essentially if your from <<YOUR_ENV_NUMBA_LIB>> import * works, then the loading would work. If not, please report to me, because we have multiple custom CUDA environments loaded by other users but not Numba, so I can imagine if there is some hiccup.

Finebouche commented 1 year ago

So I guess I wasn't using the proper <<YOUR_ENV_NUMBA_LIB>>. I was using the custom_env file (so the cpu versions with the Class that call the numba step version) instead of custom_env_step_numba (the numba file with only functions). This wasn't to clear from the tutorials I guess especially because the tutorial says that you need to register the environment. One question remains though : how does warpdrive knows about that cpu file 'custom_env' and the CustomEnv class ? Is this done secretly as well ?

Anyway after changing that I tried what you said

from custom_env_step_numba import * 
NumbaCustomEnvStep

loads the custom_env_step_numba and outputs correctly

CUDADispatcher(<function NumbaCustomEnvStep at 0x7fe0a451ee50>)

Edit : It seems to work now, I will to the EnvironmentCPUvsGPU test now.

Finebouche commented 1 year ago

As a remark, the tutorial should refer more carefully to Your_Env_Class and Your_Dual_Mode_Env_Class in the session on how to register the custom environment.

Edit : second remark, what thing that wasn't obvious to debug is that my code in custom_env_step_numba was buggy and this was causing the NumbaCustomEnvStep function to not load.. I wouldn't have realized that without your advice to test :

from custom_env_step_numba import * 
NumbaCustomEnvStep

Maybe there is a way o indicate what went wrong when loading the file ?

Emerald01 commented 1 year ago

The cpu model class is there when you call import custom_env, this is just a Python class, this is not magic but just a Python import as you did for any Python codes.

The specialty is that since you have a Numba step(), and this step() is actually outside of your Python class , WarpDrive needs to know where is the source code of the Numba step() and then integrate the compiled Numba step() to your Python class. This is done by the EnvWrapper, that is why you see the error from there because you did not provide the path of Numba step()

Finebouche commented 1 year ago

It seems there is still something wrong with my implementation as I get the following stacktrace :

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[13], line 1
----> 1 trainer.train()

File /project_ghent/warp-drive/warp_drive/training/trainer.py:422, in Trainer.train(self)
    419 start_time = time.time()
    421 # Generate a batched rollout for every CUDA environment.
--> 422 self._generate_rollout_batch()
    424 # Train / update model parameters.
    425 metrics = self._update_model_params(iteration)

File /project_ghent/warp-drive/warp_drive/training/trainer.py:469, in Trainer._generate_rollout_batch(self)
    467 # Step through all the environments
    468 start_event.record()
--> 469 self.cuda_envs.step_all_envs()
    471 # Bookkeeping rewards and done flags
    472 _, done_flags = self._bookkeep_rewards_and_done_flags(batch_index=batch_index)

File /project_ghent/warp-drive/warp_drive/env_wrapper.py:355, in EnvWrapper.step_all_envs(self, actions)
    351 """
    352 Step through all the environments
    353 """
    354 if not self.env_backend == "cpu":
--> 355     self.env.step()
    356     result = None  # Do not return anything
    357 else:

File /project_ghent/collective_MARL/custom_env.py:598, in CustomEnv.step(self, actions)
    595     if self.env_backend == "numba":
    596         print("Try calling numba step function")
    597         self.cuda_step[self.cuda_function_manager.grid, self.cuda_function_manager.block](
--> 598             *self.cuda_step_function_feed(args)
    599         )
    600     result = None  # do not return anything
    602 # CPU version of step()
    603 else:

File /project_ghent/warp-drive/warp_drive/managers/function_manager.py:120, in CUDAFunctionFeed.__call__(self, arguments)
    118 for arg in arguments:
    119     if isinstance(arg, str):
--> 120         data_pointers.append(self.data_manager.device_data(arg))
    121     elif isinstance(arg, tuple):
    122         key = arg[0]

File /project_ghent/warp-drive/warp_drive/managers/data_manager.py:395, in CUDADataManager.device_data(self, name)
    393     assert name in self._host_data
    394     return self._host_data[name]
--> 395 assert name in self._device_data_pointer
    396 return self._device_data_pointer[name]

AssertionError: 

Here self.cuda_step_function_feed(args) is called in my custom_env.py file but I don't understand what fails after...

Emerald01 commented 1 year ago

You have some data array not registered in the data manager