yrcong / RelTR

RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
229 stars 45 forks source link

Model not getting trained on single GPU #32

Open aryanmangal769 opened 10 months ago

aryanmangal769 commented 10 months ago

When I try to train on single GPU, the error keeps on increasing and I cannot see any good results even thill 38th epoch.

train_class_error starts from 97.88 and deom 19th to 37th epoch its consistently 100. Can you debug this?

Please let me know if you need some more information

yrcong commented 8 months ago

We train the model for 150 epochs. 38th epoch might be just warm-up. Maybe you can try to load some pretrained weights to accelerate the training?

qqxqqbot commented 2 months ago

@aryanmangal769 bro! How to train the model on One GPU?

qqxqqbot commented 2 months ago

@me I add os.environ['MASTER_PORT'] = '8889' in main.py

yrcong commented 2 months ago

It is not related to the port. Make --nproc_per_node=1 pls