yrcong / RelTR

RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
248 stars 49 forks source link

cannot start training #23

Closed Turkishvein closed 1 year ago

Turkishvein commented 1 year ago

cannot import functions betweeen py files

yrcong commented 1 year ago

Could you give the error report? I cannot get the point.

Turkishvein commented 1 year ago

image

Turkishvein commented 1 year ago

hello, i reduced the error message to this: from datasets import build_dataset, get_coco_api_from_dataset in datasets folder there are no build_dataset neither get_coco_api_from_dataset py files. How can i get them?

Turkishvein commented 1 year ago

i started train successfully, but this time gives that error after 2-3 hours later: File "main.py", line 191, in main train_stats = train_one_epoch(model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "C:\Users\Uğur\Desktop\reltr\engine.py", line 61, in train_one_epoch torch.nn.utils.clip_gradnorm(model.parameters(), max_norm) File "C:\anaconda\envs\reltr\lib\site-packages\torch\nn\utils\clip_grad.py", line 38, in clip_gradnorm if clip_coef < 1: RuntimeError: CUDA error: unknown error thank you for solutions

yrcong commented 1 year ago

hello, i reduced the error message to this: from datasets import build_dataset, get_coco_api_from_dataset in datasets folder there are no build_dataset neither get_coco_api_from_dataset py files. How can i get them?

Have you already installed pycocotools? I guess this is the reason.

yrcong commented 1 year ago

i started train successfully, but this time gives that error after 2-3 hours later: File "main.py", line 191, in main train_stats = train_one_epoch(model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "C:\Users\Uğur\Desktop\reltr\engine.py", line 61, in train_one_epoch torch.nn.utils.clip_gradnorm(model.parameters(), max_norm) File "C:\anaconda\envs\reltr\lib\site-packages\torch\nn\utils\clip_grad.py", line 38, in clip_gradnorm if clip_coef < 1: RuntimeError: CUDA error: unknown error thank you for solutions

It shows that something goes wrong during the gradient clipping. I have never seen this but it may be caused by the wrong cuda version. Make sure that you are using pytorch==1.6.0 cudatoolkit=10.1. If you want to use RTX30 GPU, please use CUDA11.

Turkishvein commented 1 year ago

when install conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=11 -c pytorch raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled and train not starting :) i tried several pytorch versions and started train but interrupt likely in half of first epoch..

yrcong commented 1 year ago

I have never tried the code on Windows. For Linux/RTX3090/CUDA11.1/PyTorch1.10/, it works well.

altansnl commented 1 year ago

code works on windows fine. you just have a torch error instead of an error related to this repository. GL.