zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
BSD 3-Clause "New" or "Revised" License
449 stars 51 forks source link

Distributed mode for single GPU #7

Closed TheodorPatrickZ closed 2 years ago

TheodorPatrickZ commented 2 years ago

Is it possibile to run itr_flickr as not distributed but on a single gpu?

When running: python run.py --task "itr_flickr" --dist "gpu0" --output_dir "output/itr_flickr" --checkpoint "4m_base_finetune/itr_flickr/checkpoint_best.pth"

I get:

Training Retrieval Flickr

| distributed init (rank 0): env:// Traceback (most recent call last): File "Retrieval.py", line 381, in main(args, config) File "Retrieval.py", line 215, in main utils.init_distributed_mode(args) File "C:\Users..\X-VLM-master\utils__init__.py", line 357, in init_distributed_mode world_size=args.world_size, rank=args.rank) File "C:\Users..\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group init_method, rank, world_size, timeout=timeout File "C:\Users..\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for env://

zengyan-97 commented 2 years ago

Hi,

Our code can run on a single gpu by specifying --dist "gpu0". I didn't get this error and also have no idea. Sorry.

TheodorPatrickZ commented 2 years ago

Got it running after looking at it again the next day, thanks for the fast response!