mpc001 / auto_avsr

Auto-AVSR: Lip-Reading Sentences Project
Apache License 2.0
158 stars 40 forks source link

Running demo.py in colab results in #10

Closed nastia-lado closed 10 months ago

nastia-lado commented 10 months ago

Hi!

Thanks a lot for the model. I try to run AVSR model in colab using demo.py. I'm using asr_trlrwlrs2lrs3vox2avsp_base.pth and I've specified the modality as 'audiovisual'. I'm getting this error:

Error executing job with overrides: ['data.modality=[audiovisual]', 'pretrained_model_path=[/content/asr_trlrwlrs2lrs3vox2avsp_base.pth]', 'file_path=[/content/de0fe3b3380fcc9575a8193b43226e51.mp4]']
Traceback (most recent call last):
  File "/content/auto_avsr/demo.py", line 77, in main
    pipeline = InferencePipeline(cfg)
  File "/content/auto_avsr/demo.py", line 30, in __init__
    self.modelmodule = ModelModule(cfg)
  File "/content/auto_avsr/lightning.py", line 29, in __init__
    self.model = E2E(len(self.token_list), self.backbone_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'ModelModule' object has no attribute 'backbone_args'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried also specifying ['audio', 'video'] but that doesn't seem right.

mpc001 commented 10 months ago

Hi @nastia-lado, can you please share the colab code with me so I can reproduce the error you have?

nastia-lado commented 10 months ago

Hi @mpc001! Here is the link to colab: https://colab.research.google.com/drive/1SScL2-8IwnGtjqNnwmr3jSDqWc-Hx4-t?usp=sharing

But I didn't change anything in the lines that you recommend for demo. I just included ! to run it from colab.

mpc001 commented 10 months ago

Hi @nastia-lado, thank you for sharing. I can reproduce the issue. The issue can be fixed by replace data.modality=['audio'] to data.modality='audio'. Please let me know if there are any issues.

nastia-lado commented 10 months ago

Thanks a lot! I read that there is a plan to release audiovisual models, and I saw the option to select it in the code. But as far as I understood the model is still not available, right?

mpc001 commented 10 months ago

Hi @nastia-lado, you are right; the pre-trained model has been released on Visual_Speech_Recognition_for_Multiple_Languages. Feel free to check it. However, it has not been included in this repository since the training code has not yet been released.

nastia-lado commented 10 months ago

Thanks a lot!

zhan-xu commented 6 months ago

hello @mpc001 , a small typo, in https://github.com/mpc001/auto_avsr/blob/main/demo.py#L30, you may want to change visual to video. Thanks!