liuzuxin / FSRL

🚀 A fast safe reinforcement learning library in PyTorch
https://fsrl.readthedocs.io
MIT License
160 stars 26 forks source link

关于自定义环境 #2

Closed Royalvice closed 1 year ago

Royalvice commented 1 year ago

Required prerequisites

Questions

请问我如何使用自定义的环境?

liuzuxin commented 1 year ago

Thanks for asking. Could you be more specific about what kind of customized environments you want to use? As long as the environment follows standard BulletSafetyGym or SafetyGymnasium API, it could be integrated with our package.

More specifically, we recommend your environment follow [gymnasium](https://gymnasium.farama.org/index.html) API or gym>0.26 API, which typically follows the example usage:

import gymnasium as gym
env = gym.make("Your env")
observation, info = env.reset(seed=42)
for _ in range(1000):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

Then you should include the constraint violation cost signal inside the info dictionary, i.e., for each returned info dict, it should contain an info["cost"] entry.

Royalvice commented 1 year ago

Thanks for asking. Could you be more specific about what kind of customized environments you want to use? As long as the environment follows standard BulletSafetyGym or SafetyGymnasium API, it could be integrated with our package.

More specifically, we recommend your environment follow [gymnasium](https://gymnasium.farama.org/index.html) API or gym>0.26 API, which typically follows the example usage:

import gymnasium as gym
env = gym.make("Your env")
observation, info = env.reset(seed=42)
for _ in range(1000):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

Then you should include the constraint violation cost signal inside the info dictionary, i.e., for each returned info dict, it should contain an info["cost"] entry.

I sincerely appreciate your prompt and patient response. I am a beginner in safe-RL, and I have set up an environment for unmanned drone wireless communication. My goal is to quickly adapt this environment to the API of this project and be able to run your excellent reproductions of safe-RL algorithms on this environment.

Here is a brief description of the environment I have set up:

environment.py builds a simulated environment for UAV communication with the following elements:

State Space

The main state space includes:

Action Space

Actions include:

These actions affect the environment states.

Environment Model

The environment model computes:

Reward Function

The reward is defined as video quality / energy consumption, which considers both efficiency and quality.

Helper Functions

Physics Model

The energy consumption model calculates UAV energy usage based on flying speed.

In summary, the environment:

Both the action space and state space are discrete.

Below are the functions I have already implemented:

Here are the main functions defined in environment.py:

These functions define the state/action spaces, transition dynamics, rewards, etc.

How should I modify them to quickly get safe-RL running on this repository's framework?

Thank you once again for your reply, and I look forward to your response.

Royalvice commented 1 year ago

Hi, Does fsrl support discrete action spaces?

liuzuxin commented 1 year ago

Unfortunately no. There are not many safety-related gym environments, so we didn't consider discrete action space in this repo. Note that some algorithms such as DDPG and TD3 naturally don't support discrete action space. Other methods such as SAC, CVPO, PPO, and TRPO support discrete action space theoretically, but we haven't implemented it in this repo. If you are interested, you may take a look at tianshou's implementation of SAC, PPO, and TRPO, and make some corresponding modifications to adjust them to discrete action space.

Royalvice commented 1 year ago

Thank you once again for your patient response.