Closed deep-practice closed 2 months ago
Hi, sorry for the confusion. In train_v2.py line 106 please comment:
backbone._set_static_graph()
Also, please make sure to use proper CUDA configurations.
After commenting "backbone._set_static_graph()",it failed too
Training: 2024-07-27 21:52:33,756-Speed 218.09 samples/sec Loss nan LearningRate 0.010000 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 256 Required: 1658 hours
Training: 2024-07-27 21:52:45,407-Speed 219.74 samples/sec Loss 44.0782 LearningRate 0.010000 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 128 Required: 1631 hours
Training: 2024-07-27 21:52:57,051-Speed 219.87 samples/sec Loss 44.1238 LearningRate 0.010000 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 128 Required: 1601 hours
Traceback (most recent call last):
File "train_v2.py", line 267, in
what is the Pytorch and CUDA version that you are using?
NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6 Pytorch:1.13.1+cu116
Besides,I can train normally on this machine using InsightFace
Besides,I can train normally on this machine using InsightFace
Please try the following settings: Python 3.7 Pytorch 1.8 Cuda 11.1
Traceback (most recent call last): File "train_v2.py", line 267, in
main(parser.parse_args())
File "train_v2.py", line 185, in main
img, local_labels = adversarial_img_warping(backbone=backbone,
File "/data/work/project/ARoFace/AdvWarp.py", line 86, in adversarial_img_warping
train_img = torch.cat((img[idx1], updated_img[idx2]), dim=0)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.