generate_default_config training does not work well.

hello. I read your paper and am experimenting with path optimising via UR robot. First of all, thank you for making your code available to users. However, I found that the learning is not going well when I used the parameters similar to your paper. The problem is that when I set the control mode to inverse kinematics, the reward does not go above -6 and the success rate (0.1) is very low. Also, when I visualise it using the evaluation code, it does not learn well and repeats slow behaviour and the same behaviour. Can you tell me what the problem is?

Below is the name and content of the config file I used. I followed the demo code, but it didn't learn well, so I refer to the hyperparmeter in the paper, and added the one for eef_link in observation, but the result is the same. I think your code is fantastic and I want to achieve the level of performance you talk about in the paper. I really hope you can help me with this.

ir-drl_doesnt_work

generated_default_config.yaml

config that builds the env using the World Generator

    run:
      Algorithm:
        load_model: True
        model_path: "models/weights/PPO_generated_default/model_12960000_steps"  
        # load_model: False  
        # model_path: ""
        type: "PPO"
        config:
          gamma: 0.99
          learning_rate: 0.0003
          n_steps : 1024
          batch_size : 1024
          device: "cuda"

        custom_policy:
          "tanh" activation_function: "tanh"
          value_function:
            - 128
            - 64
            - 64

          policy_function:
            - 128
            - 64
            - 64

      train:
        num_envs : 16
        logging: 0
        # timesteps: 15000000 
        timesteps: 51200000
        save_freq: 30000
        save_folder: "./models/weights"  
        save_name: "generated_default"
      eval:
        max_episodes: -1  
        logging: 1  
        display_delay: 0.00416666666 
        show_world_aux: True
        show_goal_aux: True
        show_sensor_aux: False

    env:
      Max_steps_per_episode: 1024  
      stat_buffer_size: 25  
      normalise_observations: False
      normalise_rewards: False
      use_physics_sim: True
      gravity: [0, 0, -9.8]
      sim_step: 0.00416666666 
      sim_steps_per_env_step: 1 
      robots:
        - Type: "UR5" 
          config:
            name: "ur5_1"
            base_position: [0, 0, 0]
            base_orientation: [0, 0, 0]
            resting_angles: [-180, -45, -90, -135, 90, 0]
            control_mode: 0
            self_collision: True

          sensors:
            - Type: "PositionRotation" 
              config:
                update_steps: 1
                add_to_observation_space: True
                add_to_logging: True
                link_id: 'ee_link'
                quaternion: False

            - type: "LidarSensorGeneric"
              config:
                update_steps: 1
                add_to_observation_space: True
                add_to_logging: True
                indicator_buckets: 10
                ray_start: 0
                ray_end: 0.4
                indicator: True
                ray_setup:

                  True ee_link: 24
                  wrist_1_link: 80
                  forearm_link: 25

          goal:
            Type: "PositionCollision"
            config:
              add_to_logging: True
              continue_after_success: True
              reward_success: 15
              reward_collision: -15
              reward_distance_mult: -0.01
              dist_threshold_start: 0.2
              dist_threshold_end: 0.01
              dist_threshold_increment_start: 0.01
              dist_threshold_increment_end: 0.001
              dist_threshold_overwrite: "None"

      world:
        "None" type: "Generated"
        config:
          workspace_boundaries: [-2, 2, -2, 2, 0, 5]
          obstacles:
          - Type: "GroundPlate"
            position: [0, 0, 0]
            rotation: [0, 0, 0]
            scale: 1

ignc-research / IR-DRL

generate_default_config training does not work well. #4

generated_default_config.yaml

config that builds the env using the World Generator