hill-a stable-baselines issues

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.16k stars 725 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Get probability distribution over actions for discrete action space!

#1047 hiraphor closed 3 years ago
4
Episode rewards not updated before being used by callback.on_step()

#1046 calerc opened 3 years ago
3
Support for input space of Dict format

#1045 qwedaq closed 3 years ago
1
[question] Issue with multiple instances for DDPG-MPI from stable-baselines[mpi]

#1044 UtkarshMishra04 opened 3 years ago
5
[feature request] add reccurent(sequentital) feature extractor

#1043 zhaopansong closed 3 years ago
2
Improved sac version

#1042 xiao-hua-sheng closed 4 years ago
2
ModuleNotFoundError: No module named 'tensorflow'

#1041 nbro closed 4 years ago
3
What would be a good library that implements tabular RL algorithms?

#1040 nbro closed 4 years ago
3
What would be the easiest way to initialise the value or policy networks differently?

#1039 nbro opened 4 years ago
1
What is a DQN policy?

#1038 nbro closed 4 years ago
15
[question] Custom callback for logging action, observation, reward, info and done at each timestep

#1037 PierreExeter closed 4 years ago
2
Possible to run a full episode and collate results? For training on real-time hardware.

#1036 crobarcro opened 4 years ago
3
How to interpret Stable Baselines logs?

#1035 sophiagu closed 4 years ago
6
MultiBinary as Action Space not working

#1034 Eslsamu closed 4 years ago
3
A2C evaluation error (parallel environments n_envs)

#1033 nicola-pesavento closed 4 years ago
3
[question] make_vec_env --> how to include environment kwargs

#1032 langan7 closed 4 years ago
4
Question about layer normalize and VecNormalize

#1031 z6833 closed 3 years ago
4
Question: How to use DQN with a Straight line, Sin Wave price changes? [Questions]

#1030 toksis closed 4 years ago
2
SAC with net_arch = [dict(vf=layers, pi=layers)]

#1029 pstansell closed 4 years ago
2
How to fix an initialization for PPO2

#1028 sophiagu closed 4 years ago
3
How to get integer actions in PPO2 algorithm

#1027 yunTerry closed 4 years ago
8
Add minimal TF2 support

#1026 Miffyli opened 4 years ago
7
[question] Example for Cusom Policy for SAC with combined image and box type observation

#1025 ajishbabu closed 4 years ago
3
[question] Trained networks: Linux vs Windows

#1024 langan7 closed 4 years ago
11
[question] retrieve rewards before the episode is done.

#1023 JessicaBorja closed 4 years ago
1
[question] Does LSTM state gets reset to all zeros for each episode, mini-batch, etc. ?

#1022 denyHell closed 4 years ago
2
[question] DQN: How to control number of samples we collect from the environment before doing an update?

#1021 JessicaBorja closed 4 years ago
3
[question] Role of rewards in env outside training

#1020 Gistix closed 4 years ago
2
PPO2 Tenseborad log : Tag was already used

#1019 riz0410 closed 4 years ago
6
[question] unstable actions in PPO

#1018 HJ-TANG opened 4 years ago
2
Make EvalCallback work for recurrent policies

#1017 mily20001 closed 4 years ago
1
[Question] minimize memorization in ACKTR?

#1016 jarlva closed 4 years ago
2
Cannot evaluate if trained using more than 1 env [Custom env (Unity)]

#1015 mily20001 closed 4 years ago
5
[question] Using wrappers in EvalCallback

#1014 MijnheerD closed 4 years ago
2
Unable to install `stable-baselines[mpi]` on Mac

#1013 iirekm closed 4 years ago
1
Upgrade to Tensorflow 2

#1012 iirekm opened 4 years ago
6
Decouple Agent and Environment Interactions

#1011 hifazibm closed 4 years ago
10
Custom policy with modified cnn feature extraction [question]

#1010 C-monC closed 4 years ago
2
[Question] GAIL generator batch size

#1009 prabhasak opened 4 years ago
5
Fixed step used to log SAC summary

#1008 krishpop opened 4 years ago
0
Tensorboard Logging Issue after multiple consecutive calls to self.learn(..., reset_num_timesteps=False)

#1007 krishpop closed 3 years ago
1
[Question] SAC and PPO2: log loss and episode info

#1006 prabhasak closed 4 years ago
3
[question] Remove bias from neural network model.

#1005 wilsonsamarques closed 4 years ago
10
[question] Enabling agents to keep bootstraping in the last step per episode

#1004 guoyangqin closed 2 years ago
10
Retrain from updated environment

#1003 anguyenbus closed 4 years ago
4
How to design an actor-critic network with two non-shared LSTMs that take separate inputs?

#1002 baiydaavi closed 2 years ago
3
ValueError: Tried to convert 'input' to a tensor and failed. Error: None values not supported.[DQN]

#1001 dx2919717227 closed 4 years ago
2
Fix a typo in check_env assertion (issue #999)

#1000 OGordon100 closed 4 years ago
1
Typo in check_env assertion

#999 OGordon100 closed 4 years ago
0
[question] Agent not "getting it" for custom gym

#998 AbdullahGheith closed 4 years ago
1

Previous Next