Closed OhJeongwoo closed 3 years ago
Hi @OhJeongwoo,
To answer your questions:
1) How much time to train ppo and ndp algorithm for example environment?: It should take around 3M steps to see some good results, which should be around 4-6 hours (but this could vary drastically based on your machine.
2) How can I know or visualize result plot?: The logs are saved in epoch_data.npy
, you can use any visualization package you want. I will update the repo with some visualization code shortly.
3) I am wondering about dmp_train.py code line 45 to line 72: Actually, args.T is the length of the NDP rollout. Every T steps, the NDP will output DMP parameters and execute the trajectory for T steps. So we should take the kth action for k=0, ..., T-1. More details cand be found in: https://arxiv.org/pdf/2012.02788.pdf
I have to disagree with your answer regarding 3. While it is true that you generate a new set of weights every T
timesteps, I agree with @OhJeongwoo that the step action execution is incorrect. From the resulting trajectory of length T
(here actions
), you only ever use the first action for T steps, as action
is never updated before the next DMP weights are sampled.
Hello, I have some questions while running your NDP algorithm.
I finished environment settings and run code about 8hrs with
$ sh run_rl.sh faucet dmp 2 1
however it seems not working as well.
My questions are
in this code, you input same action for step Ts to T(s+1)-1 (T = args.T), but i think that for each step, it is right to set action to actions[step%args.N] since dmp actor outputs N steps(N=args.N) actions. could you explain more detail about this part?
Thank you!