real-stanford / cow

[CVPR 2023] CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
https://arxiv.org/abs/2203.10421
110 stars 7 forks source link

queue.Empty #4

Open Southyang opened 1 year ago

Southyang commented 1 year ago

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

sagadre commented 1 year ago

When running this command line can you also run watch -n 0.5 nvidia-smi to check if the processes are running on the GPU? You should see GPU power utilization going up for the GPU you are running on.

Unfortunately, not able to reproduce this issue on my end. Maybe you can also post the exact conda environment that you are using (conda env export > environment.yml), and I can investigate that further

Bailey-24 commented 1 year ago

After waiting one hour. image

run watch -n 0.5 nvidia-smi and python pasture_runner.py -a src.models.agent_fbe_owl -n 4 --arch B32 --center bug

run conda env export > environment.yml

name: cow
channels:
  - aihabitat
  - pytorch
  - defaults
  - conda-forge
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/pro/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - attrs=21.4.0=pyhd3eb1b0_0
  - brotli=1.0.9=he6710b0_2
  - bzip2=1.0.8=h7b6447c_0
  - c-ares=1.18.1=h7f8727e_0
  - ca-certificates=2022.6.15=ha878542_0
  - certifi=2022.6.15=py37h89c1867_0
  - cmake=3.14.0=h52cb24c_0
  - cycler=0.11.0=pyhd3eb1b0_0
  - dbus=1.13.18=hb2f20db_0
  - expat=2.4.4=h295c915_0
  - ffmpeg=4.3=hf484d3e_0
  - fontconfig=2.13.1=h6c09931_0
  - fonttools=4.25.0=pyhd3eb1b0_0
  - freetype=2.11.0=h70c0345_0
  - giflib=5.2.1=h7b6447c_0
  - gitdb=4.0.9=pyhd8ed1ab_0
  - gitpython=3.1.27=pyhd8ed1ab_0
  - glib=2.69.1=h4ff587b_1
  - gmp=6.2.1=h295c915_3
  - gnutls=3.6.15=he1e5248_0
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=h28cd5cc_2
  - habitat-sim-mutex=1.0=headless_nobullet
  - headless=2.0=0
  - icu=58.2=he6710b0_3
  - imageio=2.19.3=pyhcf75d05_0
  - imageio-ffmpeg=0.4.7=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jpeg=9e=h166bdaf_1
  - kiwisolver=1.4.2=py37h7cecad7_1
  - krb5=1.19.2=hac12032_0
  - lame=3.100=h7f98852_1001
  - lcms2=2.12=h3be6417_0
  - ld_impl_linux-64=2.38=h1181459_1
  - lerc=2.2.1=h2531618_0
  - libblas=3.9.0=15_linux64_openblas
  - libcblas=3.9.0=15_linux64_openblas
  - libcurl=7.82.0=h0b77cf5_0
  - libdeflate=1.7=h27cfd23_5
  - libedit=3.1.20210910=h7f8727e_0
  - libev=4.33=h7f8727e_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=11.2.0=h1234567_1
  - libgfortran-ng=12.1.0=h69a702a_16
  - libgfortran5=12.1.0=hdcd56e2_16
  - libgomp=11.2.0=h1234567_1
  - libiconv=1.16=h7f8727e_2
  - libidn2=2.3.2=h7f8727e_0
  - liblapack=3.9.0=15_linux64_openblas
  - libllvm11=11.1.0=h3826bc1_1
  - libnghttp2=1.46.0=hce63b2e_0
  - libopenblas=0.3.20=h043d6bf_1
  - libpng=1.6.37=h21135ba_2
  - libssh2=1.10.0=h8f2d780_0
  - libstdcxx-ng=11.2.0=h1234567_1
  - libtasn1=4.16.0=h27cfd23_0
  - libtiff=4.3.0=hf544144_1
  - libunistring=0.9.10=h27cfd23_0
  - libuuid=1.0.3=h7f8727e_2
  - libwebp=1.2.2=h55f646e_0
  - libwebp-base=1.2.2=h7f98852_1
  - libxcb=1.13=h7f98852_1004
  - libxml2=2.9.14=h74e7548_0
  - llvmlite=0.38.0=py37h4ff587b_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - matplotlib=3.5.1=py37h06a4308_1
  - matplotlib-base=3.5.1=py37ha18d171_1
  - munkres=1.1.4=py_0
  - ncurses=6.3=h7f8727e_2
  - nettle=3.7.3=hbbd107a_1
  - numba=0.55.1=py37h51133e4_0
  - numpy=1.21.6=py37h976b520_0
  - olefile=0.46=pyh9f0ad1d_1
  - openh264=2.1.1=h4ff587b_0
  - openjpeg=2.4.0=hb52868f_1
  - openssl=1.1.1o=h7f8727e_0
  - packaging=21.3=pyhd3eb1b0_0
  - pcre=8.45=h295c915_0
  - pip=21.2.2=py37h06a4308_0
  - pthread-stubs=0.4=h36c2ea0_1001
  - pyparsing=3.0.9=pyhd8ed1ab_0
  - pyqt=5.9.2=py37h05f1152_2
  - python=3.7.13=h12debd9_0
  - python-dateutil=2.8.2=pyhd3eb1b0_0
  - python_abi=3.7=2_cp37m
  - qt=5.9.7=h5867ecd_1
  - quaternion=2022.4.1=py37h540881e_0
  - readline=8.1.2=h7f8727e_1
  - rhash=1.4.1=h3c74f83_1
  - scipy=1.7.3=py37hf2a6cf1_0
  - setuptools=61.2.0=py37h06a4308_0
  - sip=4.19.8=py37hf484d3e_0
  - six=1.16.0=pyhd3eb1b0_1
  - smmap=3.0.5=pyhd3eb1b0_0
  - sqlite=3.38.5=hc218d9a_0
  - tbb=2021.5.0=hd09550d_0
  - tk=8.6.12=h1ccaba5_0
  - tornado=6.1=py37h540881e_3
  - tqdm=4.64.0=py37h06a4308_0
  - typing-extensions=4.2.0=hd8ed1ab_1
  - typing_extensions=4.2.0=pyha770c72_1
  - wheel=0.37.1=pyhd3eb1b0_0
  - xorg-fixesproto=5.0=h7f98852_1002
  - xorg-inputproto=2.3.2=h7f98852_1002
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libx11=1.7.2=h7f98852_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxcursor=1.2.0=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h7f98852_1
  - xorg-libxfixes=5.0.3=h7f98852_1004
  - xorg-libxi=1.7.10=h7f98852_0
  - xorg-libxinerama=1.1.4=h9c3ff4c_1001
  - xorg-libxrandr=1.5.2=h7f98852_1
  - xorg-libxrender=0.9.10=h7f98852_1003
  - xorg-randrproto=1.5.0=h7f98852_1001
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h7f98852_1002
  - xorg-xproto=7.0.31=h27cfd23_1007
  - xz=5.2.5=h7f8727e_1
  - zlib=1.2.12=h7f8727e_2
  - zstd=1.5.2=ha4553b6_0
  - pip:
    - absl-py==1.1.0
    - ai2thor==4.3.0
    - aiohttp==3.8.1
    - aiosignal==1.2.0
    - allenact==0.5.1
    - allenact-plugins==0.5.1
    - astunparse==1.6.3
    - async-timeout==4.0.2
    - asynctest==0.13.0
    - aws-requests-auth==0.4.3
    - botocore==1.27.18
    - box2d-py==2.3.8
    - cachetools==5.2.0
    - charset-normalizer==2.0.12
    - click==8.1.3
    - cloudpickle==1.6.0
    - colour==0.1.5
    - datasets==2.3.2
    - decorator==4.4.2
    - dill==0.3.5.1
    - docker-pycreds==0.4.0
    - filelock==3.7.1
    - flask==2.1.2
    - flatbuffers==1.12
    - frozenlist==1.3.0
    - fsspec==2022.5.0
    - ftfy==6.1.1
    - gast==0.4.0
    - google-auth==2.8.0
    - google-auth-oauthlib==0.4.6
    - google-pasta==0.2.0
    - grpcio==1.47.0
    - gym==0.19.0
    - gym-minigrid==1.0.3
    - gym-notices==0.0.8
    - h5py==3.7.0
    - habitat-sim==0.2.1
    - huggingface-hub==0.8.1
    - idna==3.3
    - importlib-metadata==4.12.0
    - itsdangerous==2.1.2
    - jinja2==3.1.2
    - jmespath==1.0.1
    - joblib==1.1.0
    - keras==2.9.0
    - keras-preprocessing==1.1.2
    - libclang==14.0.1
    - markdown==3.3.7
    - markupsafe==2.1.1
    - moviepy==1.0.3
    - msgpack==1.0.4
    - multidict==6.0.2
    - multiprocess==0.70.13
    - networkx==2.6.3
    - oauthlib==3.2.0
    - opencv-python==4.6.0.66
    - opt-einsum==3.3.0
    - pandas==1.3.5
    - pathtools==0.1.2
    - patsy==0.5.2
    - pickle5==0.0.12
    - pillow==8.4.0
    - proglog==0.1.10
    - progressbar2==4.0.0
    - promise==2.3
    - protobuf==3.19.4
    - psutil==5.9.1
    - pyarrow==8.0.0
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pyglet==1.5.26
    - pyquaternion==0.9.9
    - python-utils==3.3.3
    - python-xlib==0.31
    - pytz==2022.1
    - pyyaml==6.0
    - regex==2022.6.2
    - requests==2.28.0
    - requests-oauthlib==1.3.1
    - responses==0.18.0
    - rsa==4.8
    - scikit-learn==1.0.2
    - sentry-sdk==1.9.0
    - setproctitle==1.2.3
    - shortuuid==1.0.9
    - tensorboard==2.9.1
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.1
    - tensorboardx==2.5.1
    - tensorflow==2.9.1
    - tensorflow-estimator==2.9.0
    - tensorflow-io-gcs-filesystem==0.26.0
    - termcolor==1.1.0
    - threadpoolctl==3.1.0
    - timm==0.6.7
    - tokenizers==0.12.1
    - torch==1.11.0
    - torchaudio==0.11.0
    - torchvision==0.12.0
    - transformers==4.21.1
    - trimesh==3.14.0
    - urllib3==1.26.9
    - wandb==0.13.2
    - wcwidth==0.2.5
    - werkzeug==2.1.2
    - wrapt==1.14.1
    - xxhash==3.0.0
    - yacs==0.1.8
    - yarl==1.7.2
    - zipp==3.8.0
prefix: /home/pi/anaconda3/envs/cow
tyz1030 commented 1 year ago

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

I'm having the same issue

Southyang commented 1 year ago

ai2thor The program is stuck here, is it a problem with ai2thor? robothor_challenge.py, 267 lines

Southyang commented 1 year ago

As I continued to debug, I found that it would get stuck here. I want to know,if this file exist? ai2thor/fifo_server.py

fifo

sagadre commented 1 year ago

@Southyang are you still running into issues?

Southyang commented 1 year ago

yeah, I still have this issue. And I encountered another problem. I want to run the Grad-CAM localization strategy alone, and wrote the following code:

def main():
    prompts_path = "./prompt_templates/simple_template.json"
    env_type = EnvTypes.ROBOTHOR
    class_type = ClassTypes.REGULAR
    classes, classes_clip, agent_height, floor_tolerance, negate_action, templates = get_env_class_vars(prompts_path, env_type, class_type)

    clip_model_name = "ViT-B/32"
    threshold = 0.625  # clip weight

    device_number = 0
    device = torch.device("cpu")
    if torch.cuda.is_available():
        device = torch.device("cuda:{0}".format(device_number))

    center_only = False

    print(clip_model_name, classes, classes_clip, templates, threshold, device, center_only)

    Gard_model = ClipGrad(clip_model_name, classes, classes_clip,
                          templates, threshold, device,
                          center_only=center_only)
    # print(Gard_model.class_to_language_feature['HousePlant'])
    pic = Image.open("./scene2.png")
    image = Gard_model.preprocess(pic).unsqueeze(0).to(device)
    image_relevance = Gard_model.forward(image, 'HousePlant')
    print(image_relevance.shape)
    bg_img = plt.imread('./scene2.png')

    # normalize
    adjusted_tensor = np.resize(image_relevance, (bg_img.shape[1], bg_img.shape[0]))
    denominator = np.max(adjusted_tensor) - np.min(adjusted_tensor)
    if denominator != 0:
        normalized = (adjusted_tensor - np.min(adjusted_tensor)) / denominator
    else:
        normalized = adjusted_tensor

    # print(normalized)
    plt.imshow(bg_img)
    plt.imshow(normalized, alpha=0.2, cmap='hot')
    plt.title('Grad-CAM Blended')
    plt.show()

if __name__ == '__main__':
    main()

But the output is like this.

logits_per_image: tensor([[19.7161]], device='cuda:0', grad_fn=<MmBackward0>)
image_relevance:
tensor([[8.9781e-04, 8.9781e-04, 8.9781e-04,  ..., 2.7663e-04, 2.7663e-04,
         2.7663e-04],
        [8.9781e-04, 8.9781e-04, 8.9781e-04,  ..., 2.7663e-04, 2.7663e-04,
         2.7663e-04],
        [8.9781e-04, 8.9781e-04, 8.9781e-04,  ..., 2.7663e-04, 2.7663e-04,
         2.7663e-04],
        ...,
        [1.4318e-03, 1.4318e-03, 1.4318e-03,  ..., 9.6540e-05, 9.6540e-05,
         9.6540e-05],
        [1.4318e-03, 1.4318e-03, 1.4318e-03,  ..., 9.6540e-05, 9.6540e-05,
         9.6540e-05],
        [1.4318e-03, 1.4318e-03, 1.4318e-03,  ..., 9.6540e-05, 9.6540e-05,
         9.6540e-05]], device='cuda:0')
image_relevance * self.gradient_scalar > self.threshold:
tensor([[False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        ...,
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False]], device='cuda:0')
torch.Size([224, 224])

After entering the interpret_vit function(clip_grad.py 84 line), the probability value is reduced to very small, unable to draw the heat map. 2023-06-12 10-22-42 的屏幕截图

OrmosiaCui commented 1 year ago

when I run this command python scripts/startx.py

It always shows _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed _XSERVTransMakeAllCOTSServerListeners: server already running (EE) Fatal server error: (EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE) (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. (EE) (EE) Server terminated with error (1). Closing log file.

if this is the reason of queue.Empty problem? if anyone had solved the queue.Empty problem?

CatLiZi commented 1 month ago

I'm having the same issue,too. I want to know if anyone has found a solution to it now

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

I'm having the same issue,too. I want to know if anyone has found a solution to it now

LinqingZhong commented 1 month ago

@CatLiZi @Southyang I have encountered the same issue. May I ask have you solved this problem ?

anotheryia commented 1 month ago

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

I think the problem is in line 475. receive_queue.get(timeout=1000) in 470 will not raise a TimeoutError exception, it will just raise a queue.Empty exception. So i think it just need to change line 475 and 484 to except queue.Empty: to catch the Empty exception as in line 274. The coder may have found the issue and just change it in line 274 but forget the other two.

CatLiZi commented 1 month ago

@CatLiZi @Southyang I have encountered the same issue. May I ask have you solved this problem ?

@LinqingZhong @Southyang As I delved deeper into debugging, I found that the reason for the queue being empty was consistent with the question raised by a previous researcher. During this code sentence, it got stuck and the thread was blocked, making it impossible to execute subsequent inference code

image

Has anyone found a solution to this code because I can no longer delve deeper into it

anotheryia commented 1 month ago

我这边问题出在第439行定义的x_display上面,我是ssh到服务器上运行的。一是(可能)服务器上面没登陆一个图像桌面,第267行初始化Controller会报错,二是我的x_display只有设为:2才能正常打开,:0.0,:1.0,:2.1等都是打不开的,应该参考cow/issues/7可以查到可以设置什么数值。如果是没图像的服务器可能得用CloudRendering。 For me, it stuck because of setting x_display in line 439, i ssh to the server to run the project. Firstly, if i dont log in a desktop, it errors in line 267. Also, i set x_display to :2 only for which i can successfully initialize Controller, not for :0.0, :1.0, :2.1, cow/issues/7 may be helpful. For headless(how to name it?) server, CloudRendering might be needed.