ropiens / project-sandwich-man

A project for researching a complex and long-horizon manipulation task especially focused on hierarchically stacking blocks.
MIT License
5 stars 0 forks source link

change goal format to block-gripper-informed goal #19

Closed CUN-bjy closed 3 years ago

CUN-bjy commented 3 years ago

Description

Peek 2021-10-20 00-01

Feature

  1. changed goal format to block-gripper-informed.
    • final_goal -> block-informed
    • sub_goal -> block-gripper-informed
    • (that's why codes are messed up now. need to clean the codes later.)
  2. visualize subgoals
    • obvious boxes(red one and green one) -> now states of objects
    • ghost boxes(red one and green one) -> targets of objects
    • ghost spheres
    • red one and green one -> subgoals of objects
    • black one -> subgoals of ee position

Checklist

CUN-bjy commented 3 years ago

@benthebear93 ready to merge! check and feedback plz. and it's time to start experiments after the next updating(add logger).

CUN-bjy commented 3 years ago

Now we have some problems. and this interferes with the convergence of models.

  1. critic value is so huge. maybe we need to normalize this. Screenshot from 2021-10-23 15-19-59

  2. sometimes, env gives us a zero reward(this means perfectly good performance)

    Episode: 15  Reward: -400.0
    Episode: 16  Reward: 0.0
    Episode: 17  Reward: -400.0
    Episode: 18  Reward: -398.0
    Episode: 19  Reward: -400.0
    Episode: 20  Reward: -400.0
    Episode: 21  Reward: -400.0
    Episode: 22  Reward: -400.0
    Episode: 23  Reward: -400.0
    Episode: 24  Reward: -400.0
    Episode: 25  Reward: -400.0
    Episode: 26  Reward: -400.0
    Episode: 27  Reward: -400.0
    Episode: 28  Reward: -400.0
    Episode: 29  Reward: -400.0
    Episode: 30  Reward: -400.0
    Episode: 31  Reward: -400.0
    Episode: 32  Reward: -400.0
    Episode: 33  Reward: 0.0
    Episode: 34  Reward: -400.0
    Episode: 35  Reward: -400.0
    Episode: 36  Reward: -400.0

    something wrong, we need to see this more.

don't merge yet..!

CUN-bjy commented 3 years ago

This is why we need to apply normalization of inputs.

CUN-bjy commented 3 years ago

@benthebear93 Could you make an experiment for now version of models? I don't have enough time to develop these days because of my personal mission. I'll develop more after achieving those! And please merge it after experiments.

benthebear93 commented 3 years ago

@CUN-bjy i will do it today!

benthebear93 commented 3 years ago

somethings_not_right Screenshot from 2021-11-03 12-47-05 Screenshot from 2021-11-03 12-48-15

@CUN-bjy Something is definitely wrong. but i will merge it now.

CUN-bjy commented 3 years ago

T.T

CUN-bjy commented 3 years ago

note-keeping(this is why we have to use normalized inputs. but actually the results are not just driven because of this.) https://nhigham.com/2020/08/04/what-is-numerical-stability/