xukechun / Vision-Language-Grasping

[ICRA 2023] A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
86 stars 11 forks source link

KeyError: 'input_resolution' #16

Open ubless607 opened 4 months ago

ubless607 commented 4 months ago
pybullet build time: May 16 2024 23:57:18
WARNING - 2024-05-17 02:28:39,256 - rigid_transformations - Failed to import geometry msgs in rigid_transformations.py.
WARNING - 2024-05-17 02:28:39,256 - rigid_transformations - Failed to import ros dependencies in rigid_transforms.py
WARNING - 2024-05-17 02:28:39,256 - rigid_transformations - autolab_core not installed as catkin package, RigidTransform ros methods will be unavailable
startThreads creating 1 threads.
starting thread 0
started thread 0 
argc=2
argv[0] = --unused
argv[1] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.5 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.5 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0 
MotionThreadFunc thread started
Creating data logging session: /home/commonsense/data/VLG/Vision-Language-Grasping/logs/2024-05-17-02-28-39-train
ven = Mesa/X.org
ven = Mesa/X.org
-> loaded checkpoint models/graspnet/logs/log_rs/checkpoint.tar (epoch: 18)
/home/commonsense/anaconda3/envs/vilg/lib/python3.8/site-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
Traceback (most recent call last):
  File "train.py", line 92, in <module>
    agent = ViLG(grasp_dim=7, args=args)
  File "/home/commonsense/data/VLG/Vision-Language-Grasping/models/sac.py", line 15, in __init__
    self.vilg_fusion = CLIPGraspFusion(grasp_dim, args.width, args.layers, args.heads, self.device).to(device=self.device)
  File "/home/commonsense/data/VLG/Vision-Language-Grasping/models/networks.py", line 208, in __init__
    self._load_clip()
  File "/home/commonsense/data/VLG/Vision-Language-Grasping/models/networks.py", line 233, in _load_clip
    self.clip = build_model(model.state_dict()).to(self.device)
  File "/home/commonsense/data/VLG/Vision-Language-Grasping/models/core/clip.py", line 491, in build_model
    del state_dict[key]
KeyError: 'input_resolution'
Exception ignored in: <function Environment.__del__ at 0x7fc2e748a550>
Traceback (most recent call last):
  File "/home/commonsense/data/VLG/Vision-Language-Grasping/environment_sim.py", line 425, in __del__
TypeError: 'NoneType' object is not callable
numActiveThreads = 0
stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed

I ran the evaluation code with pre-trained weight. Can you help me?

ubless607 commented 4 months ago

I've set jit=False when loading clip("VIT-B/32") due to https://github.com/xukechun/Vision-Language-Grasping/issues/10.

xukechun commented 4 months ago

Hi, it seems that the pretrained clip model is not loaded correctly. Did you download the clip VIT-B/32 model offline? BTW, we reorganize our codes with some bugs fixed. You can reclone the repo and run setup.py again without any new installation.

ubless607 commented 4 months ago

No, I used the integrated script.

ubless607 commented 4 months ago

Hi @xukechun,

I have a temporary fix for the issue. Adding the following line of code at line 489 makes the code run. Let me know if there’s anything else I can assist with to fix this issue.

state_dict["input_resolution"], state_dict["context_length"], state_dict["vocab_size"] = None, None, None
xukechun commented 4 months ago

Hi, if you use the integrated script, you don't need to set jit=False.