关于自定义环境 - Githubissues

Royalvice commented 1 year ago

Required prerequisites

[X] I have read the documentation https://safety-gymnasium.readthedocs.io.
[X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[X] Consider asking first in a Discussion.

Questions

请问我如何使用自定义的环境？

liuzuxin commented 1 year ago

Thanks for asking. Could you be more specific about what kind of customized environments you want to use? As long as the environment follows standard BulletSafetyGym or SafetyGymnasium API, it could be integrated with our package.

More specifically, we recommend your environment follow [gymnasium](https://gymnasium.farama.org/index.html) API or gym>0.26 API, which typically follows the example usage:

import gymnasium as gym
env = gym.make("Your env")
observation, info = env.reset(seed=42)
for _ in range(1000):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

Then you should include the constraint violation cost signal inside the info dictionary, i.e., for each returned info dict, it should contain an info["cost"] entry.

Royalvice commented 1 year ago

Thanks for asking. Could you be more specific about what kind of customized environments you want to use? As long as the environment follows standard BulletSafetyGym or SafetyGymnasium API, it could be integrated with our package.

More specifically, we recommend your environment follow [gymnasium](https://gymnasium.farama.org/index.html) API or gym>0.26 API, which typically follows the example usage:
import gymnasium as gym
env = gym.make("Your env")
observation, info = env.reset(seed=42)
for _ in range(1000):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()
Then you should include the constraint violation cost signal inside the info dictionary, i.e., for each returned info dict, it should contain an info["cost"] entry.

I sincerely appreciate your prompt and patient response. I am a beginner in safe-RL, and I have set up an environment for unmanned drone wireless communication. My goal is to quickly adapt this environment to the API of this project and be able to run your excellent reproductions of safe-RL algorithms on this environment.

Here is a brief description of the environment I have set up:

environment.py builds a simulated environment for UAV communication with the following elements:

State Space

The main state space includes:

Location coordinates of two UAVs: one operator UAV (o_uav) and one receiver UAV (r_uav).
Distance between o_uav and r_uav
Distance between r_uav and the base station

Action Space

Actions include:

Transmit power of UAVs
Computing resource allocation
Flying direction

These actions affect the environment states.

Environment Model

The environment model computes:

New location coordinates of UAVs based on current state and action
Communication channel capacity
Video quality
Energy consumption

Reward Function

The reward is defined as video quality / energy consumption, which considers both efficiency and quality.

Helper Functions

Reset environment
Visualize UAV trajectories
Record experiment data

Physics Model

The energy consumption model calculates UAV energy usage based on flying speed.

In summary, the environment:

Initializes UAV locations, communication range etc.
Defines action space like transmit power, resource allocation
Motion model calculates new positions
Communication model computes channel capacity, energy consumption
Reward mechanism provides immediate rewards
Helper functions for reset, data storage etc.

Both the action space and state space are discrete.

Below are the functions I have already implemented:

Here are the main functions defined in environment.py:

init() : The constructor function initializes UAV positions, speeds and other parameters.
get_action(): Generates all possible action combinations to construct the action space.
reset(): Resets the environment and returns initial state.
step(): Executes an action, returns next state, reward and energy consumption.

These functions define the state/action spaces, transition dynamics, rewards, etc.

How should I modify them to quickly get safe-RL running on this repository's framework?

Thank you once again for your reply, and I look forward to your response.

Royalvice commented 1 year ago

Hi， Does fsrl support discrete action spaces?

liuzuxin commented 1 year ago

Unfortunately no. There are not many safety-related gym environments, so we didn't consider discrete action space in this repo. Note that some algorithms such as DDPG and TD3 naturally don't support discrete action space. Other methods such as SAC, CVPO, PPO, and TRPO support discrete action space theoretically, but we haven't implemented it in this repo. If you are interested, you may take a look at tianshou's implementation of SAC, PPO, and TRPO, and make some corresponding modifications to adjust them to discrete action space.

Royalvice commented 1 year ago

Thank you once again for your patient response.

liuzuxin / FSRL

关于自定义环境 #2

Required prerequisites

Questions

State Space

Action Space

Environment Model

Reward Function

Helper Functions

Physics Model