microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 215 forks source link

Can't run bash commands in /DeBERTa/experiments/glue/ #110

Closed heya5 closed 1 year ago

heya5 commented 1 year ago

I run ./run_docker.sh firstly and run cola.sh base in experiments/glue/, but get an error like that:

/usr/bin/python: Error while finding module specification for 'DeBERTa.apps.run' (ModuleNotFoundError: No module named 'DeBERTa')

then I install DeBERTa with pip install DeBERTa , and still get an error:

07/30/2022 04:35:25|ERROR|CoLA|00| Uncatched exception happened during execution.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/DeBERTa/apps/run.py", line 389, in <module>
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/DeBERTa/apps/run.py", line 248, in main
    device = initialize_distributed(args)
  File "/usr/local/lib/python3.6/dist-packages/LASER/training/dist_launcher.py", line 110, in initialize_distributed
    return _setup_distributed_group(args)
  File "/usr/local/lib/python3.6/dist-packages/LASER/training/dist_launcher.py", line 64, in _setup_distributed_group
    init_method=init_method)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    barrier()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    work = _default_pg.barrier()
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8