soyeonm / FILM

Official repository of ICLR 2022 paper FILM: Following Instructions in Language with Modular Methods
115 stars 27 forks source link

Stuck at Resetting ThorEnv #18

Open dada-h-h opened 2 years ago

dada-h-h commented 2 years ago

HI! I'm dealing with similar issues as @biubiuisacat and @JinyeonKim. I'm stuck at Resetting ThorEnv and I double checked the dependency (pytorch==1.6.0, torchvision==0.7.0, cudatoolkit=10.2) so I don't think that's the reason why the code is not working...

Also I ran the code with a desktop with 2080 ti so hardware probably wouln't cause the problem either.

So I looked up the ai2thor code and I found the code stops working when ~/FILM/alfred_utils/env/thor_env_code.py calls the function super().step() (line 278). The function looks like below.

(ai2thor/controller.py, line 615)

def step(self, action, raise_for_failure=False):
        if self.headless:
            action["renderImage"] = False
        # prevent changes to the action from leaking
        action = copy.deepcopy(action)
        # XXX should be able to get rid of this with some sort of deprecation warning
        if 'AI2THOR_VISIBILITY_DISTANCE' in os.environ:
            action['visibilityDistance'] = float(os.environ['AI2THOR_VISIBILITY_DISTANCE'])

        should_fail = False
        self.last_action = action

        if ('objectId' in action and (action['action'] == 'OpenObject' or action['action'] == 'CloseObject')):

            force_visible = action.get('forceVisible', False)
            if not force_visible and self.last_event.instance_detections2D and action['objectId'] not in self.last_event.instance_detections2D:
                should_fail = True

            obj_metadata = self.last_event.get_object(action['objectId'])
            if obj_metadata is None or obj_metadata['isOpen'] == (action['action'] == 'OpenObject'):
                should_fail = True

        rotation = action.get('rotation')
        if rotation is not None and type(rotation) != dict:
            action['rotation'] = {}
            action['rotation']['y'] = rotation

        if should_fail:
            new_event = copy.deepcopy(self.last_event)
            new_event.metadata['lastActionSuccess'] = False
            self.last_event = new_event
            return new_event

        assert self.request_queue.empty(), 'request_queue is not empty' # continues if request_queue is empty.

        self.response_queue.put_nowait(action) #put action. nonblocking queue

        # code stops at this point.
        self.last_event = queue_get(self.request_queue)

        if not self.last_event.metadata['lastActionSuccess'] and self.last_event.metadata['errorCode'] == 'InvalidAction':
            raise ValueError(self.last_event.metadata['errorMessage'])

        if raise_for_failure:
            assert self.last_event.metadata['lastActionSuccess']

        return self.last_event

Then I found out the code stops when the function queue_get(self.request_queue) is called (I marked where it is with annotation). The function has a while loop in it and the program has to break out of the while loop if it gets an item from the request_queue, but it keeps fails to get an item from the queue because the queue is empty, so the code is just stuck at the while loop.

def queue_get(que:Queue):
    res = None

    while True:
        try:
            res = que.get(block=True, timeout=0.5)
            print("que.get result: ", res)       
            break

        except Empty:
            pass

    return res

Could I get some advice of why this happens and how to solve this problem? I'm stuck here for weeks...😭😭

Thanks!

Roadsong commented 2 years ago

@dada-h-h Exactly same here. Could you try a minimal examples https://allenai.github.io/ai2thor-v2.1.0-documentation/examples ?

You can also try to set

controller = ai2thor.controller.Controller(headless=True)

to see if there is any difference.

soyeonm commented 2 years ago

Hello, I think if you can't run the reset here, it's likely that you can't run the one in ALFRED either:

https://github.com/askforalfred/alfred/blob/master/env/thor_env.py#L47

If it's a headless computer, it's likely to be a Xserver problem. (The simulator not recognizing Xserver). You should check if ALFRED's scripts/check_thor.py works (https://github.com/askforalfred/alfred/blob/master/scripts/check_thor.py)

Roadsong commented 2 years ago

Hello, I think if you can't run the reset here, it's likely that you can't run the one in ALFRED either:

https://github.com/askforalfred/alfred/blob/master/env/thor_env.py#L47

If it's a headless computer, it's likely to be a Xserver problem. (The simulator not recognizing Xserver). You should check if ALFRED's scripts/check_thor.py works (https://github.com/askforalfred/alfred/blob/master/scripts/check_thor.py)

Hi @soyeonm, does the code is expected to work on a MacOS machine? I noticed that you also included some macos instructions in readme, but I faced the similar hanging issues here. I cannot even run a minimal example of ai2thor, version 2.1.0.

I probably should raise the issue in alfred repo, by the way.

soyeonm commented 2 years ago

Hello, thanks for your question. Yes, it ran on my mac; I will check again later today.

dada-h-h commented 2 years ago

@soyeonm @Roadsong Thank you very much for your answers!

It seems like it was a dependency issue. I tried making a new conda environment(python 3.8.5) and installed all the packages referring to the package versions in the docker container, and then it worked!

The specific versions are:

numpy==1.20.2
pandas==1.2.4 
opencv-python==4.5.1.48 
networkx==2.5.1
h5py==3.2.1
tqdm==4.64.0
vocab==0.0.5
revtok==0.0.3
Pillow==9.0.2
torch==1.6.0
torchvision==0.7.0
tensorboardX==1.8
ai2thor==2.1.0
matplotlib==3.5.1
tensorboard==2.9.1
seaborn==0.9.0
imageio==2.6.0
scikit-fmm==2019.1.30
scikit-image==0.15.0
scikit-learn==0.22.2.post1
ifcfg==0.21

I'm still not sure what exact packages are causing the issue though...

Plus, when I was installing the packages, I used this file which I pip freeze from the docker container. film_docker_requirements.txt

This is what I did: I first installed pytorch,

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

then ran conda install to download requirements, (it takes some time)

while read requirement; do conda install --yes $requirement; done < film_docker_requirements.txt

then used pip or conda-forge to install missing packages. also I checked whether check_thor.py works everytime I installed any new package.

VoHoangAnh commented 1 month ago

Hello In my case, I solved this issue with Pytorch 2.1 by reinstalling Werkzeug and Flask. pip install Werkzeug==2.03 Flask==2.1.1