shedy-pub / hlagcn-jittor

Jittor implementation of the paper "Hierarchical Layout-Aware Graph Convolutional Network for Unified Aesthetics Assessment"
50 stars 7 forks source link

使用AADB训练集来训练模型时报错 #4

Open islandLZ opened 2 years ago

islandLZ commented 2 years ago

我在使用AADB数据集训练该模型时报下列错误: [i 0616 08:08:25.320000 52 cuda_flags.cc:32] CUDA enabled. @ [Training Model] Arch = [resnet50_HLAGCN]; Dataset = [aadb] @ LR = [0.01]; Total epoch = [20]; Batch size = [8] @ weight_decay = [0.0001]; momentum = [0.9]; workers = [8] @ model save dir: results/resnet50_HLAGCN_v0_aadb @ model save period: 4 Preprocessing dataset... AADB dataset info preloaded in ./preprocess/: #8435 train #498 val #996 test Preloading file saved! [w 0616 08:10:07.622000 52 init.py:1118] load parameter fc.weight failed ... [w 0616 08:10:07.625000 52 init.py:1118] load parameter fc.bias failed ... [w 0616 08:10:07.627000 52 init.py:1136] load total 267 params, 2 failed => Start training #Ep 1 /20 08:10:07->Ep:[1][ 0/8435] - Net:59.5 - Load:56.1 - loss_avg:nan

Compiling Operators(52/52) used: 2.02s eta: 0s 08:10:07->Ep:[1][ 500/8435] - Net:0.5 - Load:0.1 - loss_avg:nan 08:10:07->Ep:[1][1000/8435] - Net:0.5 - Load:0.1 - loss_avg:nan ---> Train: 8.26 min/epoch, train loss: nan - lr: 0.01000 Traceback (most recent call last): File "D:\Python\SetUpPath\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\Python\SetUpPath\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 237, in main() File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 40, in main main_worker(args) File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 150, in main_worker val_loss, val_acc_aes = val_test_process(val_loader, model, criterions, args) File "D:\Python\SetUpPath\lib\site-packages\jittor__init__.py", line 291, in inner ret = func(*args, **kw) File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\train_jittor.py", line 224, in val_test_process metrics = cal_metrics(scores_hist, labels_hist, args.bins) File "D:\Python\Python_Demo\图像评分模型\hlagcn-jittor-main\utils_jittor\util.py", line 143, in calmetrics plcc, = pearsonr(scores_mean, labels_mean) File "D:\Python\SetUpPath\lib\site-packages\scipy\stats_stats_py.py", line 4090, in pearsonr normxm = linalg.norm(xm) File "D:\Python\SetUpPath\lib\site-packages\scipy\linalg_misc.py", line 145, in norm a = np.asarray_chkfinite(a) File "D:\Python\SetUpPath\lib\site-packages\numpy\lib\function_base.py", line 603, in asarray_chkfinite raise ValueError( ValueError: array must not contain infs or NaNs

然后我一步步调试,发现是训练时,把数据集输入网格,网格输出的tuple中有两个元素是nan image

然后我找到网格的计算代码: image

我疑惑的是为什么o2计算出来是nan?

Ruoxxi commented 1 year ago

你好,我也遇到了同样的问题,请问你的问题解决了吗?