issues
search
hill-a
/
stable-baselines
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k
stars
725
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Get probability distribution over actions for discrete action space!
#1047
hiraphor
closed
3 years ago
4
Episode rewards not updated before being used by callback.on_step()
#1046
calerc
opened
3 years ago
3
Support for input space of Dict format
#1045
qwedaq
closed
3 years ago
1
[question] Issue with multiple instances for DDPG-MPI from stable-baselines[mpi]
#1044
UtkarshMishra04
opened
3 years ago
5
[feature request] add reccurent(sequentital) feature extractor
#1043
zhaopansong
closed
3 years ago
2
Improved sac version
#1042
xiao-hua-sheng
closed
4 years ago
2
ModuleNotFoundError: No module named 'tensorflow'
#1041
nbro
closed
4 years ago
3
What would be a good library that implements tabular RL algorithms?
#1040
nbro
closed
4 years ago
3
What would be the easiest way to initialise the value or policy networks differently?
#1039
nbro
opened
4 years ago
1
What is a DQN policy?
#1038
nbro
closed
4 years ago
15
[question] Custom callback for logging action, observation, reward, info and done at each timestep
#1037
PierreExeter
closed
4 years ago
2
Possible to run a full episode and collate results? For training on real-time hardware.
#1036
crobarcro
opened
4 years ago
3
How to interpret Stable Baselines logs?
#1035
sophiagu
closed
4 years ago
6
MultiBinary as Action Space not working
#1034
Eslsamu
closed
4 years ago
3
A2C evaluation error (parallel environments n_envs)
#1033
nicola-pesavento
closed
4 years ago
3
[question] make_vec_env --> how to include environment kwargs
#1032
langan7
closed
4 years ago
4
Question about layer normalize and VecNormalize
#1031
z6833
closed
3 years ago
4
Question: How to use DQN with a Straight line, Sin Wave price changes? [Questions]
#1030
toksis
closed
4 years ago
2
SAC with net_arch = [dict(vf=layers, pi=layers)]
#1029
pstansell
closed
4 years ago
2
How to fix an initialization for PPO2
#1028
sophiagu
closed
4 years ago
3
How to get integer actions in PPO2 algorithm
#1027
yunTerry
closed
4 years ago
8
Add minimal TF2 support
#1026
Miffyli
opened
4 years ago
7
[question] Example for Cusom Policy for SAC with combined image and box type observation
#1025
ajishbabu
closed
4 years ago
3
[question] Trained networks: Linux vs Windows
#1024
langan7
closed
4 years ago
11
[question] retrieve rewards before the episode is done.
#1023
JessicaBorja
closed
4 years ago
1
[question] Does LSTM state gets reset to all zeros for each episode, mini-batch, etc. ?
#1022
denyHell
closed
4 years ago
2
[question] DQN: How to control number of samples we collect from the environment before doing an update?
#1021
JessicaBorja
closed
4 years ago
3
[question] Role of rewards in env outside training
#1020
Gistix
closed
4 years ago
2
PPO2 Tenseborad log : Tag was already used
#1019
riz0410
closed
4 years ago
6
[question] unstable actions in PPO
#1018
HJ-TANG
opened
4 years ago
2
Make EvalCallback work for recurrent policies
#1017
mily20001
closed
4 years ago
1
[Question] minimize memorization in ACKTR?
#1016
jarlva
closed
4 years ago
2
Cannot evaluate if trained using more than 1 env [Custom env (Unity)]
#1015
mily20001
closed
4 years ago
5
[question] Using wrappers in EvalCallback
#1014
MijnheerD
closed
4 years ago
2
Unable to install `stable-baselines[mpi]` on Mac
#1013
iirekm
closed
4 years ago
1
Upgrade to Tensorflow 2
#1012
iirekm
opened
4 years ago
6
Decouple Agent and Environment Interactions
#1011
hifazibm
closed
4 years ago
10
Custom policy with modified cnn feature extraction [question]
#1010
C-monC
closed
4 years ago
2
[Question] GAIL generator batch size
#1009
prabhasak
opened
4 years ago
5
Fixed step used to log SAC summary
#1008
krishpop
opened
4 years ago
0
Tensorboard Logging Issue after multiple consecutive calls to self.learn(..., reset_num_timesteps=False)
#1007
krishpop
closed
3 years ago
1
[Question] SAC and PPO2: log loss and episode info
#1006
prabhasak
closed
4 years ago
3
[question] Remove bias from neural network model.
#1005
wilsonsamarques
closed
4 years ago
10
[question] Enabling agents to keep bootstraping in the last step per episode
#1004
guoyangqin
closed
2 years ago
10
Retrain from updated environment
#1003
anguyenbus
closed
4 years ago
4
How to design an actor-critic network with two non-shared LSTMs that take separate inputs?
#1002
baiydaavi
closed
2 years ago
3
ValueError: Tried to convert 'input' to a tensor and failed. Error: None values not supported.[DQN]
#1001
dx2919717227
closed
4 years ago
2
Fix a typo in check_env assertion (issue #999)
#1000
OGordon100
closed
4 years ago
1
Typo in check_env assertion
#999
OGordon100
closed
4 years ago
0
[question] Agent not "getting it" for custom gym
#998
AbdullahGheith
closed
4 years ago
1
Previous
Next