zeyun-zhong / AFFT

Code for the paper: Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation.
Apache License 2.0
28 stars 5 forks source link

Issues with training #5

Closed lucas-ps closed 1 year ago

lucas-ps commented 1 year ago

Hi, I'm trying to recreate your results, however I'm having some issues with training.

When trying to run the TSN ek100 training, I get this:

python run.py -c expts/00_RGB_TSN_ek100_train.txt --mode train -n 1 

hydra.errors.MissingConfigException: In 'config': Could not find 'model/fuser/cmfuser'

Available options in 'model/fuser':
        CA-Fuser
        MATT
        SA-Fuser
        SA-Fuser_wo_token
        T-SA-Fuser
Config search path:
        provider=hydra, path=pkg://hydra.conf
        provider=main, path=file:///media/lucas/Linux SSD/AFFT/conf
        provider=schema, path=structured://
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1921100) of binary: /home/lucas/anaconda3/envs/afft/bin/python

When trying to run any of the other options I get an error similar to this:

python run.py -c expts/01_SA-Fuser_ek100_train.txt --mode train --nproc_per_node 2

hydra.errors    .overrides=overrides,MissingConfigException
: In 'config': Could not find 'model/backbone/identity'

Config search path:
        provider=hydra, path=pkg://hydra.conf
        provider=main, path=file:///media/lucas/Linux SSD/AFFT/conf
        provider=schema, path=structured://  File "/home/lucas/anaconda3/envs/afft/lib/python3.7/site-packages/hydra/_internal/defaults_list.py", line 485, in _create_defaults_tree_impl

    config_not_found_error(repo=repo, tree=root)
  File "/home/lucas/anaconda3/envs/afft/lib/python3.7/site-packages/hydra/_internal/defaults_list.py", line 804, in config_not_found_error
    options=options,
hydra.errors.MissingConfigException: In 'config': Could not find 'model/backbone/identity'

Config search path:
        provider=hydra, path=pkg://hydra.conf
        provider=main, path=file:///media/lucas/Linux SSD/AFFT/conf
        provider=schema, path=structured://
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1923711) of binary: /home/lucas/anaconda3/envs/afft/bin/python

Do you know why this might be happening? Apologies if I'm missing something obvious, and thanks in advance.

zeyun-zhong commented 1 year ago

Hi, thank you for your interest in our work. Sorry for the delayed response.

I have added identity.yaml in model.backbone and updated the fuser name in the corresponding experiments. This should resolve the issue.

If you have any further questions, please feel free to reopen the issue.