Open LT1st opened 9 months ago
I leave everything unchanged, and train it on my own dataset.
This may be caused by the gradient problem of Cholesky decomposition. The code has been updated.
I have applied your network on the synthRad dataset and passed the gray-scaled CT and MRI one-channel images to three-channels by simply concatenating. But on original training setting there were always warnings of "Cholesky Decomposition fails. Gradient infinity. Skip current batch." Could you give me any advice? Thanks a lot for your excellent work :)
This may be caused by the gradient problem of Cholesky decomposition. The code has been updated.
非常感谢您的建议
祝生活愉快
I have applied your network on the synthRad dataset and passed the gray-scaled CT and MRI one-channel images to three-channels by simply concatenating. But on original training setting there were always warnings of "Cholesky Decomposition fails. Gradient infinity. Skip current batch." Could you give me any advice? Thanks a lot for your excellent work :)
Sorry, the parameter 'use_double' in cWCT.py should be set to True. The code has been updated.
I am using the updated code, but this error goes as usual
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019320/00170000 content_loss:0.0000 lap_loss:159.2791 rec_loss:1.5052 style_loss:2.7486 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019330/00170000 content_loss:0.0000 lap_loss:157.6614 rec_loss:1.9370 style_loss:3.4438 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019340/00170000 content_loss:0.0000 lap_loss:275.8496 rec_loss:2.7183 style_loss:4.6778 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019350/00170000 content_loss:0.0000 lap_loss:354.6751 rec_loss:3.2100 style_loss:6.2185 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019360/00170000 content_loss:0.0000 lap_loss:559.2527 rec_loss:3.3058 style_loss:16.2796 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019370/00170000 content_loss:0.0000 lap_loss:662.8289 rec_loss:2.8588 style_loss:15.7534 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019380/00170000 content_loss:0.0000 lap_loss:1228.8931 rec_loss:6.4255 style_loss:24.9516 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019390/00170000 content_loss:0.0000 lap_loss:1417.4567 rec_loss:5.4163 style_loss:35.2438 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019400/00170000 content_loss:0.0000 lap_loss:1618.0776 rec_loss:8.5358 style_loss:32.9446 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019410/00170000 content_loss:0.0000 lap_loss:2457.8999 rec_loss:12.6839 style_loss:135.8649 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019420/00170000 content_loss:0.0000 lap_loss:2929.5208 rec_loss:11.8296 style_loss:45.2074 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019430/00170000 content_loss:0.0000 lap_loss:5270.0308 rec_loss:17.2902 style_loss:98.4670 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019440/00170000 content_loss:0.0000 lap_loss:12780.7393 rec_loss:53.6309 style_loss:517.0043 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019450/00170000 content_loss:0.0000 lap_loss:33461.5977 rec_loss:85.3787 style_loss:1157.2053 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019460/00170000 content_loss:0.0000 lap_loss:47801.2383 rec_loss:110.6021 style_loss:927.8384 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019470/00170000 content_loss:0.0000 lap_loss:105222.0625 rec_loss:200.1128 style_loss:3148.9697 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00019480/00170000 content_loss:0.0000 lap_loss:245005.7812 rec_loss:699.6542 style_loss:7568.2607 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019490/00170000 content_loss:0.0000 lap_loss:301240.6875 rec_loss:538.1981 style_loss:10290.3311 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019500/00170000 content_loss:0.0000 lap_loss:452129.2812 rec_loss:402.8548 style_loss:13304.7422 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019510/00170000 content_loss:0.0000 lap_loss:781383.0625 rec_loss:497.7421 style_loss:20818.8828 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019520/00170000 content_loss:0.0000 lap_loss:1462175.7500 rec_loss:561.4804 style_loss:24778.1133 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019530/00170000 content_loss:0.0000 lap_loss:2074892.6250 rec_loss:915.9771 style_loss:46083.4766 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019540/00170000 content_loss:0.0000 lap_loss:2088487.0000 rec_loss:2681.4905 style_loss:54167.8203 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019550/00170000 content_loss:0.0000 lap_loss:3387649.0000 rec_loss:1232.0409 style_loss:76106.6094 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019560/00170000 content_loss:0.0000 lap_loss:10170006.0000 rec_loss:3335.5005 style_loss:321179.4375 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019570/00170000 content_loss:0.0000 lap_loss:11045876.0000 rec_loss:1911.4858 style_loss:231068.3438 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019580/00170000 content_loss:0.0000 lap_loss:13109793.0000 rec_loss:4361.4756 style_loss:309041.6875 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019590/00170000 content_loss:0.0000 lap_loss:23149722.0000 rec_loss:6688.6255 style_loss:791635.1250 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019600/00170000 content_loss:0.0000 lap_loss:13212357.0000 rec_loss:4038.3118 style_loss:313771.0938 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019610/00170000 content_loss:0.0000 lap_loss:63433156.0000 rec_loss:5623.5425 style_loss:2108203.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019620/00170000 content_loss:0.0000 lap_loss:54538236.0000 rec_loss:7540.0439 style_loss:1154739.5000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019630/00170000 content_loss:0.0000 lap_loss:133052504.0000 rec_loss:7938.0957 style_loss:2468518.7500 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019640/00170000 content_loss:0.0000 lap_loss:219064544.0000 rec_loss:14273.8584 style_loss:4771924.5000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019650/00170000 content_loss:0.0000 lap_loss:1157315584.0000 rec_loss:49690.8906 style_loss:33978448.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019660/00170000 content_loss:0.0000 lap_loss:652230848.0000 rec_loss:33877.5156 style_loss:22673144.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019670/00170000 content_loss:0.0000 lap_loss:260332928.0000 rec_loss:14764.3955 style_loss:6373633.5000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019700/00170000 content_loss:0.0000 lap_loss:175779216.0000 rec_loss:12537.4365 style_loss:4020703.2500 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019710/00170000 content_loss:0.0000 lap_loss:244010336.0000 rec_loss:14608.0186 style_loss:4740527.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019720/00170000 content_loss:0.0000 lap_loss:507039232.0000 rec_loss:38317.7891 style_loss:9450994.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019730/00170000 content_loss:0.0000 lap_loss:186500992.0000 rec_loss:9350.5967 style_loss:3021213.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019740/00170000 content_loss:0.0000 lap_loss:175715888.0000 rec_loss:7774.5225 style_loss:3432000.2500 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019750/00170000 content_loss:0.0000 lap_loss:525769280.0000 rec_loss:17648.4688 style_loss:9259133.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019760/00170000 content_loss:0.0000 lap_loss:1385499648.0000 rec_loss:17735.6582 style_loss:21579124.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019770/00170000 content_loss:0.0000 lap_loss:2157881856.0000 rec_loss:31950.2969 style_loss:37357400.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019780/00170000 content_loss:0.0000 lap_loss:3972313088.0000 rec_loss:104818.9844 style_loss:87564992.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019790/00170000 content_loss:0.0000 lap_loss:8301088256.0000 rec_loss:57633.3555 style_loss:158462496.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019800/00170000 content_loss:0.0000 lap_loss:27188193280.0000 rec_loss:136121.8281 style_loss:430402016.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019810/00170000 content_loss:0.0000 lap_loss:109260144640.0000 rec_loss:434222.3750 style_loss:1828851456.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019820/00170000 content_loss:0.0000 lap_loss:314164772864.0000 rec_loss:1450301.0000 style_loss:7710819328.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019830/00170000 content_loss:0.0000 lap_loss:601583976448.0000 rec_loss:649785.0625 style_loss:10285472768.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019840/00170000 content_loss:0.0000 lap_loss:445602758656.0000 rec_loss:323235.3750 style_loss:7091161088.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019850/00170000 content_loss:0.0000 lap_loss:663548264448.0000 rec_loss:465235.5625 style_loss:11168151552.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019860/00170000 content_loss:0.0000 lap_loss:2332661645312.0000 rec_loss:2389557.0000 style_loss:48662765568.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019870/00170000 content_loss:0.0000 lap_loss:1990058049536.0000 rec_loss:1574681.0000 style_loss:36620349440.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019880/00170000 content_loss:0.0000 lap_loss:2676915961856.0000 rec_loss:494977.8438 style_loss:38082404352.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019890/00170000 content_loss:0.0000 lap_loss:3788393152512.0000 rec_loss:1284546.6250 style_loss:81677107200.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019900/00170000 content_loss:0.0000 lap_loss:3168595869696.0000 rec_loss:550553.0000 style_loss:47469285376.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019910/00170000 content_loss:0.0000 lap_loss:3099389591552.0000 rec_loss:2370434.0000 style_loss:72614346752.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019920/00170000 content_loss:0.0000 lap_loss:3503058845696.0000 rec_loss:2100743.2500 style_loss:65855000576.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019930/00170000 content_loss:0.0000 lap_loss:2580092813312.0000 rec_loss:1831351.8750 style_loss:43814285312.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019940/00170000 content_loss:0.0000 lap_loss:1809156145152.0000 rec_loss:566731.1250 style_loss:26138923008.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019950/00170000 content_loss:0.0000 lap_loss:3197836722176.0000 rec_loss:2696372.7500 style_loss:57295392768.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019960/00170000 content_loss:0.0000 lap_loss:11973642158080.0000 rec_loss:1680209.7500 style_loss:193298317312.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019970/00170000 content_loss:0.0000 lap_loss:28124095971328.0000 rec_loss:1428778.6250 style_loss:484218535936.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019980/00170000 content_loss:0.0000 lap_loss:27247220097024.0000 rec_loss:2255443.0000 style_loss:386921431040.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00019990/00170000 content_loss:0.0000 lap_loss:29163742298112.0000 rec_loss:2618583.2500 style_loss:418054569984.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020000/00170000 content_loss:0.0000 lap_loss:31372972392448.0000 rec_loss:2565462.7500 style_loss:496382509056.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020010/00170000 content_loss:0.0000 lap_loss:39647988154368.0000 rec_loss:5251524.0000 style_loss:583873921024.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020020/00170000 content_loss:0.0000 lap_loss:46498096087040.0000 rec_loss:2221302.7500 style_loss:604987326464.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020030/00170000 content_loss:0.0000 lap_loss:93096444428288.0000 rec_loss:5183118.0000 style_loss:1156396875776.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020040/00170000 content_loss:0.0000 lap_loss:573172378238976.0000 rec_loss:29055988.0000 style_loss:10539991302144.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020050/00170000 content_loss:0.0000 lap_loss:543102305566720.0000 rec_loss:8868641.0000 style_loss:7797688238080.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020060/00170000 content_loss:0.0000 lap_loss:477242605961216.0000 rec_loss:7353300.0000 style_loss:6330485047296.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020070/00170000 content_loss:0.0000 lap_loss:719586605400064.0000 rec_loss:9596196.0000 style_loss:10513586061312.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020080/00170000 content_loss:0.0000 lap_loss:2917163799150592.0000 rec_loss:16187840.0000 style_loss:39120764141568.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Cholesky Decomposition fails. Gradient infinity. Skip current batch.
Iteration: 00020090/00170000 content_loss:0.0000 lap_loss:6823342065582080.0000 rec_loss:26745362.0000 style_loss:92027928707072.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020100/00170000 content_loss:0.0000 lap_loss:14940274593628160.0000 rec_loss:117756656.0000 style_loss:230195063685120.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020110/00170000 content_loss:0.0000 lap_loss:45787877543510016.0000 rec_loss:377901056.0000 style_loss:712197583929344.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020120/00170000 content_loss:0.0000 lap_loss:47623589515493376.0000 rec_loss:91975888.0000 style_loss:718440654438400.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020130/00170000 content_loss:0.0000 lap_loss:65257887714246656.0000 rec_loss:67180312.0000 style_loss:992186569064448.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020140/00170000 content_loss:0.0000 lap_loss:41286017377894400.0000 rec_loss:327302144.0000 style_loss:644938261856256.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020150/00170000 content_loss:0.0000 lap_loss:37427715111911424.0000 rec_loss:110797008.0000 style_loss:580690785599488.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020160/00170000 content_loss:0.0000 lap_loss:123380915626835968.0000 rec_loss:105322232.0000 style_loss:1489347824058368.0000 loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020170/00170000 content_loss:0.0000 lap_loss:nan rec_loss:nan style_loss:nan loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020180/00170000 content_loss:0.0000 lap_loss:nan rec_loss:nan style_loss:nan loss_tmp:0.0000 loss_tmp_GT:0.0000
Iteration: 00020190/00170000 content_loss:0.0000 lap_loss:nan rec_loss:nan style_loss:nan loss_tmp:0.0000 loss_tmp_GT:0.0000
I train the model on COCO dataset and the training is stable. Please check your dataset if the code is the same.
Iteration: 00169900/00170000 content_loss:0.0000 lap_loss:0.0377 rec_loss:0.0686 style_loss:0.8494 loss_tmp:0.4747 loss_tmp_GT:0.0703
Iteration: 00169910/00170000 content_loss:0.0000 lap_loss:0.0928 rec_loss:0.1001 style_loss:2.1163 loss_tmp:0.3042 loss_tmp_GT:0.0815
Iteration: 00169920/00170000 content_loss:0.0000 lap_loss:0.1460 rec_loss:0.0882 style_loss:1.8143 loss_tmp:0.4197 loss_tmp_GT:0.0778
Iteration: 00169930/00170000 content_loss:0.0000 lap_loss:0.0464 rec_loss:0.0757 style_loss:1.1384 loss_tmp:0.3250 loss_tmp_GT:0.0662
Iteration: 00169940/00170000 content_loss:0.0000 lap_loss:0.1805 rec_loss:0.1085 style_loss:2.7221 loss_tmp:0.4230 loss_tmp_GT:0.0891
Iteration: 00169950/00170000 content_loss:0.0000 lap_loss:0.0585 rec_loss:0.1119 style_loss:1.3657 loss_tmp:0.2979 loss_tmp_GT:0.0695
Iteration: 00169960/00170000 content_loss:0.0000 lap_loss:0.1552 rec_loss:0.0925 style_loss:1.8784 loss_tmp:0.3627 loss_tmp_GT:0.0542
Iteration: 00169970/00170000 content_loss:0.0000 lap_loss:0.0782 rec_loss:0.1287 style_loss:1.5004 loss_tmp:0.5541 loss_tmp_GT:0.0942
Iteration: 00169980/00170000 content_loss:0.0000 lap_loss:0.0465 rec_loss:0.0897 style_loss:1.0761 loss_tmp:0.3871 loss_tmp_GT:0.0488
Iteration: 00169990/00170000 content_loss:0.0000 lap_loss:0.0396 rec_loss:0.1487 style_loss:1.0013 loss_tmp:0.2502 loss_tmp_GT:0.0757
Iteration: 00170000/00170000 content_loss:0.0000 lap_loss:0.2025 rec_loss:0.0613 style_loss:1.9088 loss_tmp:0.3521 loss_tmp_GT:0.0695
These are the output of console:
The
nvidia-smi
gives the info below:Should I give up training?