我想试一下100种已知字体，5种未知字体，每个字体2500个字符，我只修改了02a_run_ddp.sh中的 --output_k mkdir output mkdir output/models

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch \ --nproc_per_node=1 --use_env main.py \ --img_size 80 \ --data_path data/imgs/Seen240_S80F50_TRAIN800 \ --lr 1e-4 \ --output_k 100 \
--batch_size 16 \ --iters 1000 \ --epoch 200 \ --val_num 10 \ --baseline_idx 0 \ --save_path output/models \ --model_name B0_K240BS32I1000E200_LR1e-4-wdl0.01 \ --ddp \ --wdl --w_wdl 0.01 \ --no_val

--load_model CF-Font/output/models/logs/B0_K240BS32I1000E200_LR1e-4-wdl0.01_20230426-233306

但运行sh scripts/02a_run_ddp.sh时出错，RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches) 不知道怎么设置或者解决（显存是24G，pytorch是1.8.0）

wangchi95 / CF-Font

运行sh scripts/02a_run_ddp.sh出错 #35

--load_model CF-Font/output/models/logs/B0_K240BS32I1000E200_LR1e-4-wdl0.01_20230426-233306