kvablack / susie

Code for subgoal synthesis via image editing
https://rail-berkeley.github.io/susie
MIT License
109 stars 14 forks source link

Bad performance in evaluation #9

Closed houyaokun closed 10 months ago

houyaokun commented 11 months ago

I download the diffusion model and goal conditioned policy checkpoints from https://huggingface.co/patreya/susie-calvin-checkpoints and set the values of the environment variables in eval_susie.sh, but the result is not good : Average successful sequence length: 0.4666666666666667 Success rates for i instructions in a row: 1: 33.3% 2: 13.3% 3: 0.0% 4: 0.0% 5: 0.0% turn_on_led: 2 / 2 | SR: 100.0% open_drawer: 4 / 4 | SR: 100.0% turn_on_lightbulb: 1 / 1 | SR: 100.0% push_blue_block_right: 0 / 1 | SR: 0.0% rotate_blue_block_right: 0 / 1 | SR: 0.0% lift_blue_block_slider: 0 / 1 | SR: 0.0% lift_blue_block_table: 0 / 1 | SR: 0.0% push_pink_block_left: 0 / 2 | SR: 0.0% move_slider_left: 0 / 3 | SR: 0.0% push_blue_block_left: 0 / 2 | SR: 0.0% lift_red_block_slider: 0 / 1 | SR: 0.0% push_red_block_left: 0 / 1 | SR: 0.0% rotate_red_block_left: 0 / 1 | SR: 0.0% lift_red_block_table: 0 / 1 | SR: 0.0% What could be the possible issues? Thank you for your time.

houyaokun commented 10 months ago

out out out

The actions of the robotic arm seem strange in some tasks, and I suspect that it may be an issue with GCBC.The robotic arm has even moved outside the field of view. out

Sometimes the robotic arm will move at peculiar angles. image However, it performs relatively better on the task of pulling the handle to open the drawer.

out

houyaokun commented 10 months ago

During the evaluation, the following messages appeared. Are these messages the cause of the problem?

2023-11-24 14:33:49.090058: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-11-24 14:33:49.113779: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-24 14:33:50.157969: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-24 14:33:50.173970: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-24 14:33:50.174105: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 pybullet build time: Nov 9 2023 10:51:31 Global seed set to 0 Initialized persistent compilation cache at /home/zs/.jax_compilation_cache The config attributes {'pretrained': 'instruct-pix2pix'} were passed to FlaxUNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file. WARNING:absl:FlaxUNet2DConditionModel unused kwargs: {'pretrained': 'instruct-pix2pix'} /home/zs/anaconda3/envs/susie/lib/python3.10/site-packages/diffusers/configuration_utils.py:217: FutureWarning: It is deprecated to pass a pretrained model name or path to from_config. deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False) WARNING:absl:Extra kwargs passed to CalvinDataset: {'num_devices': 1, 'prefetch_num_batches': 20} 2023-11-24 14:34:05.136133: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization 2023-11-24 14:34:05.541715: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -14 } dim { size: -15 } dim { size: -16 } dim { size: -17 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2112 num_cores: 20 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 26214400 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -44 } dim { size: -45 } dim { size: -17 } } } 2023-11-24 14:34:05.541775: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: -19 } dim { size: -20 } dim { size: -21 } } } inputs { dtype: DT_FLOAT shape { dim { size: -3 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -3 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2112 num_cores: 20 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 26214400 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -3 } dim { size: -46 } dim { size: -47 } dim { size: -21 } } } 2023-11-24 14:34:05.541811: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -22 } dim { size: -23 } dim { size: -24 } dim { size: -25 } } } inputs { dtype: DT_FLOAT shape { dim { size: -4 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -4 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2112 num_cores: 20 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 26214400 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -4 } dim { size: -48 } dim { size: -49 } dim { size: -25 } } } 2023-11-24 14:34:05.542088: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -32 } dim { size: -33 } dim { size: -34 } dim { size: -35 } } } inputs { dtype: DT_FLOAT shape { dim { size: -8 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -8 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2112 num_cores: 20 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 26214400 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -8 } dim { size: -53 } dim { size: -54 } dim { size: -35 } } } 2023-11-24 14:34:05.542120: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -36 } dim { size: -37 } dim { size: -38 } dim { size: -39 } } } inputs { dtype: DT_FLOAT shape { dim { size: -10 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -10 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2112 num_cores: 20 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 26214400 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -10 } dim { size: -55 } dim { size: -56 } dim { size: -39 } } } 2023-11-24 14:34:05.542142: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -40 } dim { size: -41 } dim { size: -42 } dim { size: -43 } } } inputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -12 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2112 num_cores: 20 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 26214400 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: -57 } dim { size: -58 } dim { size: -43 } } } Loading checkpoint... Checkpoint successfully loaded argv[0]= startThreads creating 1 threads. starting thread 0 started thread 0 argc=3 argv[0] = --unused argv[1] = argv[2] = --start_demo_name=Physics Server ExampleBrowserThreadFunc started X11 functions dynamically loaded using dlopen/dlsym OK! X11 functions dynamically loaded using dlopen/dlsym OK! Creating context Created GL 3.3 context Direct GLX rendering context obtained Making context current GL_VENDOR=Intel GL_RENDERER=Mesa Intel(R) Graphics (ADL-S GT1) GL_VERSION=4.6 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.60 pthread_getconcurrency()=0 Version = 4.6 (Core Profile) Mesa 21.2.6 Vendor = Intel Renderer = Mesa Intel(R) Graphics (ADL-S GT1) b3Printf: Selected demo: Physics Server startThreads creating 1 threads. starting thread 0 started thread 0 MotionThreadFunc thread started ven = Intel Workaround for some crash in the Intel OpenGL driver on Linux/Ubuntu ven = Intel Workaround for some crash in the Intel OpenGL driver on Linux/Ubuntu /home/zs/anaconda3/envs/susie/lib/python3.10/site-packages/urdfpy/urdf.py:2169: RuntimeWarning: invalid value encountered in divide value = value / np.linalg.norm(value) logging to /tmp/evaluation 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:59<00:00, 1.00it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:53<00:00, 1.02it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:50<00:00, 1.03it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:55<00:00, 1.01it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:56<00:00, 1.01it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:56<00:00, 1.01it/s] 13%|██████████████▌ | 47/360 [00:57<06:25, 1.23s/it] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:56<00:00, 1.01it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:55<00:00, 1.01it/s] 22%|████████████████████████▌ | 79/360 [01:19<04:41, 1.00s/it] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:57<00:00, 1.01it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:56<00:00, 1.01it/s] 25%|████████████████████████████▎ | 91/360 [01:37<04:49, 1.08s/it] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:57<00:00, 1.01it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:57<00:00, 1.01it/s] 23%|██████████████████████████▏ | 84/360 [01:37<05:19, 1.16s/it] 54%|███████████████████████████████████████████████████████████▌ | 193/360 [03:17<02:51, 1.03s/it] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:55<00:00, 1.01it/s] 23%|██████████████████████████▏ | 84/360 [01:36<05:16, 1.15s/it] 32%|████████████████████████████████████ | 117/360 [01:57<04:03, 1.00s/it] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:56<00:00, 1.01it/s] 1/5 : 33.3% | 2/5 : 13.3% | 3/5 : 0.0% | 4/5 : 0.0% | 5/5 : 0.0% ||: 100%|█████████████████████████████████████████| 15/15 [1:41:24<00:00, 405.63s/it] Results for Epoch 0:████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [05:55<00:00, 7.00it/s] Average successful sequence length: 0.4666666666666667 Success rates for i instructions in a row: 1: 33.3% 2: 13.3% 3: 0.0% 4: 0.0% 5: 0.0% turn_on_led: 2 / 2 | SR: 100.0% open_drawer: 4 / 4 | SR: 100.0% turn_on_lightbulb: 1 / 1 | SR: 100.0% push_blue_block_right: 0 / 1 | SR: 0.0% rotate_blue_block_right: 0 / 1 | SR: 0.0% lift_blue_block_slider: 0 / 1 | SR: 0.0% lift_blue_block_table: 0 / 1 | SR: 0.0% push_pink_block_left: 0 / 2 | SR: 0.0% move_slider_left: 0 / 3 | SR: 0.0% push_blue_block_left: 0 / 2 | SR: 0.0% lift_red_block_slider: 0 / 1 | SR: 0.0% push_red_block_left: 0 / 1 | SR: 0.0% rotate_red_block_left: 0 / 1 | SR: 0.0% lift_red_block_table: 0 / 1 | SR: 0.0%

Best model: epoch 0 with average sequences length of 0.4666666666666667

houyaokun commented 10 months ago

And my Python version is 3.10, which differs from the requested version of 3.8. But I also tried testing in python 3.8, and the results were equally bad.My tensorflow is 2.13.0, and jax is 0.4.11.

Because I have insufficient VRAM, I set TensorFlow to use the CPU and Jax to use the GPU. Will this have an impact on the results?

houyaokun commented 10 months ago

When NUM_EVAL_SEQUENCES=150: Average successful sequence length: 0.22 Success rates for i instructions in a row: 1: 19.3% 2: 2.7% 3: 0.0% 4: 0.0% 5: 0.0% turn_on_led: 10 / 10 | SR: 100.0% rotate_red_block_right: 2 / 5 | SR: 40.0% open_drawer: 11 / 21 | SR: 52.4% push_red_block_left: 2 / 11 | SR: 18.2% move_slider_right: 1 / 14 | SR: 7.1% turn_off_led: 4 / 8 | SR: 50.0% push_blue_block_right: 1 / 5 | SR: 20.0% move_slider_left: 1 / 13 | SR: 7.7% rotate_blue_block_left: 1 / 3 | SR: 33.3% close_drawer: 0 / 6 | SR: 0.0% lift_pink_block_table: 0 / 7 | SR: 0.0% turn_on_lightbulb: 0 / 8 | SR: 0.0% push_blue_block_left: 0 / 8 | SR: 0.0% rotate_red_block_left: 0 / 3 | SR: 0.0% turn_off_lightbulb: 0 / 5 | SR: 0.0% push_pink_block_right: 0 / 4 | SR: 0.0% lift_pink_block_slider: 0 / 2 | SR: 0.0% lift_red_block_table: 0 / 9 | SR: 0.0% rotate_pink_block_right: 0 / 6 | SR: 0.0% lift_blue_block_table: 0 / 5 | SR: 0.0% lift_blue_block_slider: 0 / 5 | SR: 0.0% lift_red_block_slider: 0 / 5 | SR: 0.0% push_pink_block_left: 0 / 6 | SR: 0.0% rotate_pink_block_left: 0 / 2 | SR: 0.0% push_red_block_right: 0 / 3 | SR: 0.0% rotate_blue_block_right: 0 / 3 | SR: 0.0% push_into_drawer: 0 / 6 | SR: 0.0%

Best model: epoch 0 with average sequences length of 0.4

MozhganPourKeshavarz commented 2 months ago

Hi @houyaokun

did you find the problem?