Closed chenfeng-huang closed 1 year ago
The evaluation on the provided baseline model checkpoint only resulted SR with 23%
Yeah, we train with a large effective batch size of 64 * 8 GPUs = 512, so usually the models converge between epoch 20 to 30.
I also evaluate with D_D_static_rgb_baseline , the SR is only about 11% as below, what will be the problem? I have not change any code but just run the evaluate_policy.py
Average successful sequence length: 0.117 Success rates for i instructions in a row: 1: 11.3% 2: 0.4% 3: 0.0% 4: 0.0% 5: 0.0% turn_on_led: 49 / 61 | SR: 80.3% push_red_block_left: 4 / 41 | SR: 9.8% move_slider_right: 1 / 98 | SR: 1.0% rotate_red_block_right: 7 / 30 | SR: 23.3% turn_off_led: 14 / 55 | SR: 25.5% open_drawer: 17 / 131 | SR: 13.0% push_pink_block_left: 5 / 35 | SR: 14.3% rotate_blue_block_right: 3 / 33 | SR: 9.1% push_pink_block_right: 6 / 34 | SR: 17.6% push_blue_block_left: 5 / 35 | SR: 14.3% rotate_pink_block_right: 1 / 33 | SR: 3.0% push_blue_block_right: 2 / 26 | SR: 7.7% move_slider_left: 1 / 73 | SR: 1.4% push_red_block_right: 2 / 32 | SR: 6.2% lift_pink_block_slider: 0 / 28 | SR: 0.0% rotate_blue_block_left: 0 / 26 | SR: 0.0% close_drawer: 0 / 48 | SR: 0.0% lift_red_block_slider: 0 / 28 | SR: 0.0% turn_on_lightbulb: 0 / 43 | SR: 0.0% lift_blue_block_slider: 0 / 31 | SR: 0.0% turn_off_lightbulb: 0 / 36 | SR: 0.0% lift_red_block_table: 0 / 33 | SR: 0.0% lift_blue_block_table: 0 / 30 | SR: 0.0% lift_pink_block_table: 0 / 28 | SR: 0.0% rotate_pink_block_left: 0 / 25 | SR: 0.0% push_into_drawer: 0 / 20 | SR: 0.0% rotate_red_block_left: 0 / 24 | SR: 0.0%
I downloaded the pretrain D_D_static_rgb_baseline model checkpoint provided in the main page, but found its epoch only 23. Is it checkpoint already fully trained (like the loss has already converged) or not?