我在使用AADB数据集训练该模型时报下列错误:
[i 0616 08:08:25.320000 52 cuda_flags.cc:32] CUDA enabled.
@ [Training Model] Arch = [resnet50_HLAGCN]; Dataset = [aadb]
@ LR = [0.01]; Total epoch = [20]; Batch size = [8]
@ weight_decay = [0.0001]; momentum = [0.9]; workers = [8]
@ model save dir: results/resnet50_HLAGCN_v0_aadb
@ model save period: 4
Preprocessing dataset...
AADB dataset info preloaded in ./preprocess/: #8435 train #498 val #996 test
Preloading file saved!
[w 0616 08:10:07.622000 52 init.py:1118] load parameter fc.weight failed ...
[w 0616 08:10:07.625000 52 init.py:1118] load parameter fc.bias failed ...
[w 0616 08:10:07.627000 52 init.py:1136] load total 267 params, 2 failed
=> Start training #Ep 1 /20
08:10:07->Ep:[1][ 0/8435] - Net:59.5 - Load:56.1 - loss_avg:nan
Compiling Operators(52/52) used: 2.02s eta: 0s
08:10:07->Ep:[1][ 500/8435] - Net:0.5 - Load:0.1 - loss_avg:nan
08:10:07->Ep:[1][1000/8435] - Net:0.5 - Load:0.1 - loss_avg:nan
---> Train: 8.26 min/epoch, train loss: nan - lr: 0.01000
Traceback (most recent call last):
File "D:\Python\SetUpPath\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Python\SetUpPath\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 237, in
main()
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 40, in main
main_worker(args)
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 150, in main_worker
val_loss, val_acc_aes = val_test_process(val_loader, model, criterions, args)
File "D:\Python\SetUpPath\lib\site-packages\jittor__init__.py", line 291, in inner
ret = func(*args, **kw)
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 224, in val_test_process
metrics = cal_metrics(scores_hist, labels_hist, args.bins)
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\util.py", line 143, in calmetrics
plcc, = pearsonr(scores_mean, labels_mean)
File "D:\Python\SetUpPath\lib\site-packages\scipy\stats_stats_py.py", line 4090, in pearsonr
normxm = linalg.norm(xm)
File "D:\Python\SetUpPath\lib\site-packages\scipy\linalg_misc.py", line 145, in norm
a = np.asarray_chkfinite(a)
File "D:\Python\SetUpPath\lib\site-packages\numpy\lib\function_base.py", line 603, in asarray_chkfinite
raise ValueError(
ValueError: array must not contain infs or NaNs
我在使用AADB数据集训练该模型时报下列错误: [i 0616 08:08:25.320000 52 cuda_flags.cc:32] CUDA enabled. @ [Training Model] Arch = [resnet50_HLAGCN]; Dataset = [aadb] @ LR = [0.01]; Total epoch = [20]; Batch size = [8] @ weight_decay = [0.0001]; momentum = [0.9]; workers = [8] @ model save dir: results/resnet50_HLAGCN_v0_aadb @ model save period: 4 Preprocessing dataset... AADB dataset info preloaded in ./preprocess/: #8435 train #498 val #996 test Preloading file saved! [w 0616 08:10:07.622000 52 init.py:1118] load parameter fc.weight failed ... [w 0616 08:10:07.625000 52 init.py:1118] load parameter fc.bias failed ... [w 0616 08:10:07.627000 52 init.py:1136] load total 267 params, 2 failed => Start training #Ep 1 /20 08:10:07->Ep:[1][ 0/8435] - Net:59.5 - Load:56.1 - loss_avg:nan
Compiling Operators(52/52) used: 2.02s eta: 0s 08:10:07->Ep:[1][ 500/8435] - Net:0.5 - Load:0.1 - loss_avg:nan 08:10:07->Ep:[1][1000/8435] - Net:0.5 - Load:0.1 - loss_avg:nan ---> Train: 8.26 min/epoch, train loss: nan - lr: 0.01000 Traceback (most recent call last): File "D:\Python\SetUpPath\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\Python\SetUpPath\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 237, in
main()
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 40, in main
main_worker(args)
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 150, in main_worker
val_loss, val_acc_aes = val_test_process(val_loader, model, criterions, args)
File "D:\Python\SetUpPath\lib\site-packages\jittor__init__.py", line 291, in inner
ret = func(*args, **kw)
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 224, in val_test_process
metrics = cal_metrics(scores_hist, labels_hist, args.bins)
File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\util.py", line 143, in calmetrics
plcc, = pearsonr(scores_mean, labels_mean)
File "D:\Python\SetUpPath\lib\site-packages\scipy\stats_stats_py.py", line 4090, in pearsonr
normxm = linalg.norm(xm)
File "D:\Python\SetUpPath\lib\site-packages\scipy\linalg_misc.py", line 145, in norm
a = np.asarray_chkfinite(a)
File "D:\Python\SetUpPath\lib\site-packages\numpy\lib\function_base.py", line 603, in asarray_chkfinite
raise ValueError(
ValueError: array must not contain infs or NaNs
然后我一步步调试,发现是训练时,把数据集输入网格,网格输出的tuple中有两个元素是nan
然后我找到网格的计算代码:
我疑惑的是为什么o2计算出来是nan?