Closed Royalvice closed 1 year ago
Thanks for asking. Could you be more specific about what kind of customized environments you want to use? As long as the environment follows standard BulletSafetyGym or SafetyGymnasium API, it could be integrated with our package.
More specifically, we recommend your environment follow [gymnasium](https://gymnasium.farama.org/index.html)
API or gym>0.26
API, which typically follows the example usage:
import gymnasium as gym
env = gym.make("Your env")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample() # this is where you would insert your policy
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Then you should include the constraint violation cost signal inside the info
dictionary, i.e., for each returned info
dict, it should contain an info["cost"]
entry.
Thanks for asking. Could you be more specific about what kind of customized environments you want to use? As long as the environment follows standard BulletSafetyGym or SafetyGymnasium API, it could be integrated with our package.
More specifically, we recommend your environment follow
[gymnasium](https://gymnasium.farama.org/index.html)
API orgym>0.26
API, which typically follows the example usage:import gymnasium as gym env = gym.make("Your env") observation, info = env.reset(seed=42) for _ in range(1000): action = env.action_space.sample() # this is where you would insert your policy observation, reward, terminated, truncated, info = env.step(action) if terminated or truncated: observation, info = env.reset() env.close()
Then you should include the constraint violation cost signal inside the
info
dictionary, i.e., for each returnedinfo
dict, it should contain aninfo["cost"]
entry.
I sincerely appreciate your prompt and patient response. I am a beginner in safe-RL, and I have set up an environment for unmanned drone wireless communication. My goal is to quickly adapt this environment to the API of this project and be able to run your excellent reproductions of safe-RL algorithms on this environment.
Here is a brief description of the environment I have set up:
environment.py builds a simulated environment for UAV communication with the following elements:
The main state space includes:
Location coordinates of two UAVs: one operator UAV (o_uav) and one receiver UAV (r_uav).
Distance between o_uav and r_uav
Distance between r_uav and the base station
Actions include:
Transmit power of UAVs
Computing resource allocation
Flying direction
These actions affect the environment states.
The environment model computes:
New location coordinates of UAVs based on current state and action
Communication channel capacity
Video quality
Energy consumption
The reward is defined as video quality / energy consumption, which considers both efficiency and quality.
The energy consumption model calculates UAV energy usage based on flying speed.
In summary, the environment:
Initializes UAV locations, communication range etc.
Defines action space like transmit power, resource allocation
Motion model calculates new positions
Communication model computes channel capacity, energy consumption
Reward mechanism provides immediate rewards
Helper functions for reset, data storage etc.
Both the action space and state space are discrete.
Below are the functions I have already implemented:
Here are the main functions defined in environment.py:
init() : The constructor function initializes UAV positions, speeds and other parameters.
get_action(): Generates all possible action combinations to construct the action space.
reset(): Resets the environment and returns initial state.
step(): Executes an action, returns next state, reward and energy consumption.
These functions define the state/action spaces, transition dynamics, rewards, etc.
How should I modify them to quickly get safe-RL running on this repository's framework?
Thank you once again for your reply, and I look forward to your response.
Hi, Does fsrl support discrete action spaces?
Unfortunately no. There are not many safety-related gym environments, so we didn't consider discrete action space in this repo. Note that some algorithms such as DDPG and TD3 naturally don't support discrete action space. Other methods such as SAC, CVPO, PPO, and TRPO support discrete action space theoretically, but we haven't implemented it in this repo. If you are interested, you may take a look at tianshou's implementation of SAC, PPO, and TRPO, and make some corresponding modifications to adjust them to discrete action space.
Thank you once again for your patient response.
Required prerequisites
Questions
请问我如何使用自定义的环境?