change goal format to block-gripper-informed goal

ropiens / project-sandwich-man

A project for researching a complex and long-horizon manipulation task especially focused on hierarchically stacking blocks.

MIT License

5 stars 0 forks source link

change goal format to block-gripper-informed goal #19

Closed CUN-bjy closed 3 years ago

CUN-bjy commented 3 years ago

Description

Peek 2021-10-20 00-01

Feature

changed goal format to block-gripper-informed.
- final_goal -> block-informed
- sub_goal -> block-gripper-informed
- (that's why codes are messed up now. need to clean the codes later.)
visualize subgoals
- obvious boxes(red one and green one) -> now states of objects
- ghost boxes(red one and green one) -> targets of objects
- ghost spheres
- red one and green one -> subgoals of objects
- black one -> subgoals of ee position

Checklist

[x] this code is auto-formatted using make format
[x] I updated the READMEs and the documentation, if necessary.

CUN-bjy commented 3 years ago

@benthebear93 ready to merge! check and feedback plz. and it's time to start experiments after the next updating(add logger).

CUN-bjy commented 3 years ago

Now we have some problems. and this interferes with the convergence of models.

critic value is so huge. maybe we need to normalize this.

sometimes, env gives us a zero reward(this means perfectly good performance)

Episode: 15  Reward: -400.0
Episode: 16  Reward: 0.0
Episode: 17  Reward: -400.0
Episode: 18  Reward: -398.0
Episode: 19  Reward: -400.0
Episode: 20  Reward: -400.0
Episode: 21  Reward: -400.0
Episode: 22  Reward: -400.0
Episode: 23  Reward: -400.0
Episode: 24  Reward: -400.0
Episode: 25  Reward: -400.0
Episode: 26  Reward: -400.0
Episode: 27  Reward: -400.0
Episode: 28  Reward: -400.0
Episode: 29  Reward: -400.0
Episode: 30  Reward: -400.0
Episode: 31  Reward: -400.0
Episode: 32  Reward: -400.0
Episode: 33  Reward: 0.0
Episode: 34  Reward: -400.0
Episode: 35  Reward: -400.0
Episode: 36  Reward: -400.0

something wrong, we need to see this more.

don't merge yet..!

CUN-bjy commented 3 years ago

This is why we need to apply normalization of inputs.

CUN-bjy commented 3 years ago

@benthebear93 Could you make an experiment for now version of models? I don't have enough time to develop these days because of my personal mission. I'll develop more after achieving those! And please merge it after experiments.

benthebear93 commented 3 years ago

@CUN-bjy i will do it today!

benthebear93 commented 3 years ago

somethings_not_right Screenshot from 2021-11-03 12-47-05

@CUN-bjy Something is definitely wrong. but i will merge it now.

CUN-bjy commented 3 years ago

T.T

CUN-bjy commented 3 years ago

note-keeping(this is why we have to use normalized inputs. but actually the results are not just driven because of this.) https://nhigham.com/2020/08/04/what-is-numerical-stability/