Open AlexandreBrown opened 2 weeks ago
Is this your own custom environment? What environment is this exactly? And are you planning to use the GPU sim + rendering?
Hi @StoneT2000 , I am using SimplerEnv and TorchRL.
The code is a TorchRL Env that wraps SimplerEnv environment to utilize it in TorchRL unified interface.
TorchRL env wrapper:
import torch
import numpy as np
from tensordict import TensorDict, TensorDictBase
from torchrl.envs import EnvBase
from torchrl.data import Composite, Unbounded, Bounded
from sapien.pysapien.render import RenderTexture2D
import sapien
class SimplerEnvWrapper(EnvBase):
def __init__(self, base_env, **kwargs):
super().__init__(**kwargs)
self._device = torch.device(kwargs.get("device", "cpu"))
self.base_env = base_env
self.numpy_to_torch_dtype_dict = {
bool: torch.bool,
np.uint8: torch.uint8,
np.int8: torch.int8,
np.int16: torch.int16,
np.int32: torch.int32,
np.int64: torch.int64,
np.float16: torch.float16,
np.float32: torch.float32,
np.float64: torch.float64,
}
self._make_specs()
def _make_specs(self):
raw_observation_spec = self.get_image_from_maniskill3_obs_dict(
self.base_env, self.base_env.observation_space.spaces
)
height = raw_observation_spec.shape[-3]
width = raw_observation_spec.shape[-2]
self.channels = raw_observation_spec.shape[-1]
shape = (height, width, self.channels)
observation_spec = {
"pixels": Bounded(
low=torch.from_numpy(
raw_observation_spec.low[0, :, :, : self.channels]
).to(self._device),
high=torch.from_numpy(
raw_observation_spec.high[0, :, :, : self.channels]
).to(self._device),
shape=shape,
dtype=torch.uint8,
device=self._device,
)
}
self.observation_spec = Composite(**observation_spec)
action_space = self.base_env.action_space
self.action_spec = Bounded(
low=torch.from_numpy(action_space.low).to(self._device),
high=torch.from_numpy(action_space.high).to(self._device),
shape=action_space.shape,
dtype=self.numpy_to_torch_dtype_dict[action_space.dtype.type],
device=self._device,
)
self.reward_spec = Unbounded(
shape=(1,), dtype=torch.float32, device=self._device
)
self.done_spec = Unbounded(shape=(1,), dtype=torch.bool, device=self._device)
def get_image_from_maniskill3_obs_dict(self, env, obs, camera_name=None):
if camera_name is None:
if "google_robot" in env.unwrapped.robot_uids.uid:
camera_name = "overhead_camera"
elif "widowx" in env.unwrapped.robot_uids.uid:
camera_name = "3rd_view_camera"
else:
raise NotImplementedError()
img = obs["sensor_data"][camera_name]["rgb"]
return img
def _reset(self, tensordict: TensorDictBase = None):
base_color_texture = RenderTexture2D(
"/home/user/Downloads/cliff_side_4k.blend/textures/cliff_side_diff_4k.jpg"
)
for actor_name in self.base_env.unwrapped.scene.actors.keys():
for part in self.base_env.unwrapped.scene.actors[actor_name]._objs:
for triangle in (
part.find_component_by_type(sapien.render.RenderBodyComponent)
.render_shapes[0]
.parts
):
# triangle.material.set_base_color([0.8, 0.1, 0.1, 1.0])
triangle.material.set_base_color_texture(base_color_texture)
obs_dict, _ = self.base_env.reset()
rgb_obs = (
self.get_image_from_maniskill3_obs_dict(self.base_env, obs_dict)[
0, :, :, : self.channels
]
.to(torch.uint8)
.squeeze(0)
)
text_instruction = self.base_env.unwrapped.get_language_instruction()
done = torch.tensor(False, dtype=torch.bool, device=self._device)
terminated = torch.tensor(False, dtype=torch.bool, device=self._device)
return TensorDict(
{
"pixels": rgb_obs,
"text_instruction": text_instruction,
"done": done,
"terminated": terminated,
},
batch_size=[],
device=self._device,
)
def _step(self, tensordict: TensorDictBase):
action = tensordict["action"]
obs_dict, reward, done, _, info = self.base_env.step(action)
rgb_obs = (
self.get_image_from_maniskill3_obs_dict(self.base_env, obs_dict)[
0, :, :, : self.channels
]
.to(torch.uint8)
.squeeze(0)
)
text_instruction = self.base_env.unwrapped.get_language_instruction()
return TensorDict(
{
"pixels": rgb_obs,
"text_instruction": text_instruction,
"reward": reward,
"done": done,
},
batch_size=[],
device=self._device,
)
def _set_seed(self, seed: int):
self.base_env.seed(seed)
PS: I am not sure I am doing this right, should I apply the changes before the environment reset?
PS #2 : Is there specific file requirements for the texture file ? Do you have a test sample I can use as well? Or does any texture from publicly available texture websites work ?
Where base_env
is obtained using Maniskill3 gym integration :
from mani_skill.envs.sapien_env import BaseEnv
...
env_name = cfg["env"]["name"]
sensor_configs = dict()
sensor_configs["shader_pack"] = "default"
base_env: BaseEnv = gym.make(
env_name,
max_episode_steps=max_episode_steps,
obs_mode="rgb+segmentation",
num_envs=1,
sensor_configs=sensor_configs,
render_mode="rgb_array",
sim_backend=cfg["env"]["device"],
)
I am testing the following existing environments from maniskill3 (using SimplerEnv):
PutCarrotOnPlateInScene-v1
PutCarrotOnPlateInScene-v1
PutSpoonOnTableClothInScene-v1
StackGreenCubeOnYellowCubeBakedTexInScene-v1
(Since currently only these are supported by SimplerEnv/Maniskill3 integration). My goal is to leverage the flexibility of maniskill3/simplerenv and be able to :
The more I can achieve from this list, the better.
Note that I am not familiar with Maniskill3 so I did not try to create anything custom yet.
Ideally I would like to apply these randomization at the start of the episode.
I assume video overlay would require per step update (if we treat a video as a sequence of frames where at each step we update the overlayed frame).
I understand that GPU vectorization probably means these use cases are much harder, in which case I would prefer to go for the low hanging fruit first (eg: randomization that are only applied at the start of the episode, if that's easier).
Yes I plan on using the GPU to improve simulation performance (fps), I assume that sim_backend='cuda'
is what needs to be done for this but please feel free to tell me more about it. GPU vectorization is a strong motivation for me to use maniskill3 with simplerenv (via their maniskill3 branch) instead of the existing maniskill2/simplerenv.
Thanks for the extensive notes, all of what you suggest are possible but it depends a little bit on what models you want to evaluate actually.
There are two ways forward. The easiest option actually is to build a new table-top environment (take one of the templates or e.g. the pick cube environment) and add the parallelizations / randomizations you want for a custom environment. Only choose this option if you don't need to verify real2sim alignment and just simply want a controllable robot and objects.
Alternatively you can copy the code for the bridge dataset digital twins and modify the attributes in there to change the default RGB overlays, swap the overlay at each timestep when using video, modify the scene loader to add distractor objects etc.
Let me know which option you think is needed and I can suggest the relevant docs/code to do what you want.
Thanks a lot @StoneT2000 for the amazing reply!
do you plan to train a model and evaluate it? Or evaluate off the shelf models?
I plan on training and evaluating models (training from scatch).
How realistic do you want the environment to look? Are you planning to try vision based sim2real or just do real2sim evaluation of a model trained on real world data?
Are you planning to try vision based sim2real
Yes.
I want to train in simulation using an environment that is as realistic as possible (visually) but if this hinders training time I'm open to try to train using a hybrid approach where the environments are still realistic but maybe slightly less (eg: without ray-tracing) to boost collection speed during training and then the visual generalization benchmark can be more realistic and slower.
Basically I will need to train agents from scratch in simulation and then once trained, I will evaluate the approach using aggressive visual domain randomization (aggressive sim2real visual changes like random camera FOV, random objects colors, random textures, random lighting, random objects if it's feasible etc). The model will only depend on image observation (RGB pixels) and will be trained in an online RL fashion.
I am focused on an approach that shows generalization over visual distractions so the more visual distractions I can showcase the better.
The easiest option actually is to build a new table-top environment (take one of the templates or e.g. the pick cube environment) and add the parallelizations / randomizations you want for a custom environment.
This sounds interesting as I also want not just 1 environment but at least 2-3 that can show increasing level of difficulty (eg: easy to hard).
Is it easier to create an environment from scatch or to start from an existing one ? Context : I have very little experience in environment design. Where can I find a template and documentation for this ?
When you say "add the parallelizations" what do you mean exactly?
@StoneT2000 After looking at the doc for Maniskill3, I'm tempted to use Maniskill3 directly instead of SimplerEnv. Would it be feasible to use Maniskill3 directly while also being able to add the visual distractions ?
Any help is appreciated!
Hi,
I would like to change textures (randomly or via a png file) of the various objects in the scene (eg: before every new episode).
I managed to change the
base_color
but when I change the textures, nothing happens.Any pointers is appreciated.
The objective is to change textures, camera FOV, lighting and if possible add new objects to evaluate methods for visual generalization.
PS: I do not know much about textures, I downloaded a sample file from https://polyhaven.com/a/cliff_side