Open BhargavDodla opened 1 year ago
Hi @BhargavDodla, thanks for raising the question.
Thanks.
I hava similar problem, the GPU memory always increase, and then Out of Memory.
Evaluation [ 0/5000] eta: 1:57:37 time: 1.4116 data: 0.3283 max mem: 9518
Evaluation [ 10/5000] eta: 0:14:17 time: 0.1719 data: 0.0302 max mem: 9555
Evaluation [ 20/5000] eta: 0:09:24 time: 0.0484 data: 0.0003 max mem: 9594
Evaluation [ 30/5000] eta: 0:07:41 time: 0.0492 data: 0.0004 max mem: 9633
Evaluation [ 40/5000] eta: 0:06:50 time: 0.0507 data: 0.0004 max mem: 9671
Evaluation [ 50/5000] eta: 0:06:15 time: 0.0499 data: 0.0004 max mem: 9709
Evaluation [ 60/5000] eta: 0:05:53 time: 0.0488 data: 0.0004 max mem: 9747
Evaluation [ 70/5000] eta: 0:05:38 time: 0.0498 data: 0.0004 max mem: 9785
Evaluation [ 80/5000] eta: 0:05:25 time: 0.0492 data: 0.0004 max mem: 9824
Evaluation [ 90/5000] eta: 0:05:14 time: 0.0482 data: 0.0003 max mem: 9862
Evaluation [ 100/5000] eta: 0:05:06 time: 0.0482 data: 0.0003 max mem: 9900
Evaluation [ 110/5000] eta: 0:04:59 time: 0.0483 data: 0.0003 max mem: 9938
Evaluation [ 120/5000] eta: 0:04:53 time: 0.0482 data: 0.0003 max mem: 9976
Evaluation [ 130/5000] eta: 0:04:48 time: 0.0484 data: 0.0003 max mem: 10015
Evaluation [ 140/5000] eta: 0:04:44 time: 0.0484 data: 0.0003 max mem: 10053
Evaluation [ 150/5000] eta: 0:04:40 time: 0.0484 data: 0.0003 max mem: 10091
Evaluation [ 160/5000] eta: 0:04:37 time: 0.0489 data: 0.0003 max mem: 10129
Evaluation [ 170/5000] eta: 0:04:34 time: 0.0492 data: 0.0004 max mem: 10166
Evaluation [ 180/5000] eta: 0:04:32 time: 0.0494 data: 0.0004 max mem: 10204
Evaluation [ 190/5000] eta: 0:04:29 time: 0.0493 data: 0.0004 max mem: 10242
Evaluation [ 200/5000] eta: 0:04:27 time: 0.0485 data: 0.0003 max mem: 10280
Evaluation [ 210/5000] eta: 0:04:24 time: 0.0484 data: 0.0003 max mem: 10315
Evaluation [ 220/5000] eta: 0:04:22 time: 0.0484 data: 0.0003 max mem: 10352
Evaluation [ 230/5000] eta: 0:04:21 time: 0.0497 data: 0.0004 max mem: 10388
Evaluation [ 240/5000] eta: 0:04:21 time: 0.0545 data: 0.0005 max mem: 10425
Evaluation [ 250/5000] eta: 0:04:20 time: 0.0553 data: 0.0006 max mem: 10463
Evaluation [ 260/5000] eta: 0:04:19 time: 0.0513 data: 0.0004 max mem: 10501
Evaluation [ 270/5000] eta: 0:04:17 time: 0.0494 data: 0.0004 max mem: 10537
Evaluation [ 280/5000] eta: 0:04:16 time: 0.0486 data: 0.0003 max mem: 10576
Evaluation [ 290/5000] eta: 0:04:14 time: 0.0486 data: 0.0004 max mem: 10613
Evaluation [ 300/5000] eta: 0:04:13 time: 0.0493 data: 0.0004 max mem: 10650
Evaluation [ 310/5000] eta: 0:04:12 time: 0.0499 data: 0.0004 max mem: 10687
Evaluation [ 320/5000] eta: 0:04:12 time: 0.0539 data: 0.0004 max mem: 10724
Evaluation [ 330/5000] eta: 0:04:10 time: 0.0517 data: 0.0005 max mem: 10761
Evaluation [ 340/5000] eta: 0:04:09 time: 0.0475 data: 0.0004 max mem: 10798
Evaluation [ 350/5000] eta: 0:04:08 time: 0.0515 data: 0.0004 max mem: 10835
Evaluation [ 360/5000] eta: 0:04:07 time: 0.0510 data: 0.0004 max mem: 10873
Evaluation [ 370/5000] eta: 0:04:06 time: 0.0487 data: 0.0003 max mem: 10910
Evaluation [ 380/5000] eta: 0:04:05 time: 0.0487 data: 0.0003 max mem: 10947
Evaluation [ 390/5000] eta: 0:04:05 time: 0.0518 data: 0.0004 max mem: 10985
Evaluation [ 400/5000] eta: 0:04:04 time: 0.0518 data: 0.0004 max mem: 11022
Evaluation [ 410/5000] eta: 0:04:03 time: 0.0487 data: 0.0003 max mem: 11060
Evaluation [ 420/5000] eta: 0:04:02 time: 0.0488 data: 0.0003 max mem: 11097
Evaluation [ 430/5000] eta: 0:04:01 time: 0.0488 data: 0.0003 max mem: 11134
Evaluation [ 440/5000] eta: 0:04:00 time: 0.0487 data: 0.0003 max mem: 11172
Evaluation [ 450/5000] eta: 0:03:59 time: 0.0507 data: 0.0004 max mem: 11209
Evaluation [ 460/5000] eta: 0:03:58 time: 0.0506 data: 0.0004 max mem: 11247
Evaluation [ 470/5000] eta: 0:03:58 time: 0.0495 data: 0.0004 max mem: 11284
Evaluation [ 480/5000] eta: 0:03:57 time: 0.0495 data: 0.0004 max mem: 11322
Evaluation [ 490/5000] eta: 0:03:56 time: 0.0486 data: 0.0003 max mem: 11359
Evaluation [ 500/5000] eta: 0:03:55 time: 0.0485 data: 0.0003 max mem: 11397
Evaluation [ 510/5000] eta: 0:03:54 time: 0.0486 data: 0.0003 max mem: 11434
Evaluation [ 520/5000] eta: 0:03:53 time: 0.0486 data: 0.0003 max mem: 11471
Evaluation [ 530/5000] eta: 0:03:52 time: 0.0486 data: 0.0003 max mem: 11509
Evaluation [ 540/5000] eta: 0:03:52 time: 0.0487 data: 0.0003 max mem: 11546
Evaluation [ 550/5000] eta: 0:03:51 time: 0.0485 data: 0.0003 max mem: 11584
Evaluation [ 560/5000] eta: 0:03:50 time: 0.0486 data: 0.0003 max mem: 11621
Evaluation [ 570/5000] eta: 0:03:50 time: 0.0526 data: 0.0004 max mem: 11659
Evaluation [ 580/5000] eta: 0:03:49 time: 0.0525 data: 0.0004 max mem: 11696
Evaluation [ 590/5000] eta: 0:03:48 time: 0.0488 data: 0.0003 max mem: 11734
Evaluation [ 600/5000] eta: 0:03:48 time: 0.0489 data: 0.0003 max mem: 11771
Evaluation [ 610/5000] eta: 0:03:47 time: 0.0487 data: 0.0003 max mem: 11809
Evaluation [ 620/5000] eta: 0:03:46 time: 0.0488 data: 0.0003 max mem: 11846
Evaluation [ 630/5000] eta: 0:03:45 time: 0.0487 data: 0.0003 max mem: 11883
Evaluation [ 640/5000] eta: 0:03:45 time: 0.0487 data: 0.0003 max mem: 11921
Evaluation [ 650/5000] eta: 0:03:44 time: 0.0489 data: 0.0004 max mem: 11958
Evaluation [ 660/5000] eta: 0:03:43 time: 0.0488 data: 0.0003 max mem: 11996
Evaluation [ 670/5000] eta: 0:03:43 time: 0.0487 data: 0.0003 max mem: 12033
Evaluation [ 680/5000] eta: 0:03:42 time: 0.0488 data: 0.0003 max mem: 12071
Evaluation [ 690/5000] eta: 0:03:41 time: 0.0488 data: 0.0003 max mem: 12108
Evaluation [ 700/5000] eta: 0:03:41 time: 0.0488 data: 0.0003 max mem: 12146
Evaluation [ 710/5000] eta: 0:03:40 time: 0.0490 data: 0.0003 max mem: 12183
Evaluation [ 720/5000] eta: 0:03:39 time: 0.0490 data: 0.0003 max mem: 12220
Evaluation [ 730/5000] eta: 0:03:39 time: 0.0488 data: 0.0003 max mem: 12258
Evaluation [ 740/5000] eta: 0:03:38 time: 0.0489 data: 0.0003 max mem: 12295
Evaluation [ 750/5000] eta: 0:03:37 time: 0.0491 data: 0.0003 max mem: 12333
Evaluation [ 760/5000] eta: 0:03:37 time: 0.0488 data: 0.0003 max mem: 12370
Evaluation [ 770/5000] eta: 0:03:36 time: 0.0488 data: 0.0003 max mem: 12408
Evaluation [ 780/5000] eta: 0:03:35 time: 0.0490 data: 0.0004 max mem: 12445
Evaluation [ 790/5000] eta: 0:03:35 time: 0.0489 data: 0.0003 max mem: 12483
Evaluation [ 800/5000] eta: 0:03:34 time: 0.0487 data: 0.0003 max mem: 12520
Evaluation [ 810/5000] eta: 0:03:33 time: 0.0488 data: 0.0003 max mem: 12558
Evaluation [ 820/5000] eta: 0:03:33 time: 0.0488 data: 0.0003 max mem: 12595
Evaluation [ 830/5000] eta: 0:03:32 time: 0.0488 data: 0.0003 max mem: 12632
Evaluation [ 840/5000] eta: 0:03:32 time: 0.0489 data: 0.0003 max mem: 12670
Evaluation [ 850/5000] eta: 0:03:31 time: 0.0490 data: 0.0004 max mem: 12707
Evaluation [ 860/5000] eta: 0:03:30 time: 0.0490 data: 0.0004 max mem: 12745
Evaluation [ 870/5000] eta: 0:03:30 time: 0.0489 data: 0.0004 max mem: 12782
Evaluation [ 880/5000] eta: 0:03:29 time: 0.0489 data: 0.0004 max mem: 12820
Evaluation [ 890/5000] eta: 0:03:29 time: 0.0488 data: 0.0003 max mem: 12857
Evaluation [ 900/5000] eta: 0:03:28 time: 0.0508 data: 0.0004 max mem: 12895
Evaluation [ 910/5000] eta: 0:03:28 time: 0.0508 data: 0.0004 max mem: 12932
Evaluation [ 920/5000] eta: 0:03:27 time: 0.0488 data: 0.0003 max mem: 12969
Evaluation [ 930/5000] eta: 0:03:26 time: 0.0489 data: 0.0003 max mem: 13007
Evaluation [ 940/5000] eta: 0:03:26 time: 0.0527 data: 0.0004 max mem: 13044
Evaluation [ 950/5000] eta: 0:03:26 time: 0.0527 data: 0.0004 max mem: 13082
Evaluation [ 960/5000] eta: 0:03:25 time: 0.0490 data: 0.0003 max mem: 13119
Evaluation [ 970/5000] eta: 0:03:24 time: 0.0489 data: 0.0003 max mem: 13157
Evaluation [ 980/5000] eta: 0:03:24 time: 0.0488 data: 0.0003 max mem: 13194
Evaluation [ 990/5000] eta: 0:03:23 time: 0.0489 data: 0.0003 max mem: 13232
Evaluation [1000/5000] eta: 0:03:23 time: 0.0489 data: 0.0003 max mem: 13269
Evaluation [1010/5000] eta: 0:03:22 time: 0.0490 data: 0.0003 max mem: 13307
Evaluation [1020/5000] eta: 0:03:21 time: 0.0490 data: 0.0003 max mem: 13344
Evaluation [1030/5000] eta: 0:03:21 time: 0.0492 data: 0.0003 max mem: 13381
Evaluation [1040/5000] eta: 0:03:20 time: 0.0492 data: 0.0003 max mem: 13419
Evaluation [1050/5000] eta: 0:03:20 time: 0.0489 data: 0.0003 max mem: 13456
Evaluation [1060/5000] eta: 0:03:19 time: 0.0488 data: 0.0003 max mem: 13494
Evaluation [1070/5000] eta: 0:03:19 time: 0.0489 data: 0.0003 max mem: 13531
Evaluation [1080/5000] eta: 0:03:18 time: 0.0509 data: 0.0004 max mem: 13569
Evaluation [1090/5000] eta: 0:03:18 time: 0.0508 data: 0.0004 max mem: 13606
Evaluation [1100/5000] eta: 0:03:17 time: 0.0489 data: 0.0003 max mem: 13644
Evaluation [1110/5000] eta: 0:03:16 time: 0.0489 data: 0.0003 max mem: 13681
Evaluation [1120/5000] eta: 0:03:16 time: 0.0489 data: 0.0003 max mem: 13718
Evaluation [1130/5000] eta: 0:03:15 time: 0.0489 data: 0.0003 max mem: 13756
Evaluation [1140/5000] eta: 0:03:15 time: 0.0489 data: 0.0003 max mem: 13793
Evaluation [1150/5000] eta: 0:03:14 time: 0.0493 data: 0.0003 max mem: 13831
Evaluation [1160/5000] eta: 0:03:14 time: 0.0488 data: 0.0004 max mem: 13868
Evaluation [1170/5000] eta: 0:03:13 time: 0.0515 data: 0.0005 max mem: 13906
Evaluation [1180/5000] eta: 0:03:13 time: 0.0511 data: 0.0005 max mem: 13943
Evaluation [1190/5000] eta: 0:03:12 time: 0.0483 data: 0.0003 max mem: 13981
Evaluation [1200/5000] eta: 0:03:12 time: 0.0494 data: 0.0003 max mem: 14018
Evaluation [1210/5000] eta: 0:03:11 time: 0.0538 data: 0.0004 max mem: 14056
Evaluation [1220/5000] eta: 0:03:11 time: 0.0538 data: 0.0004 max mem: 14093
Evaluation [1230/5000] eta: 0:03:10 time: 0.0495 data: 0.0004 max mem: 14130
Evaluation [1240/5000] eta: 0:03:10 time: 0.0497 data: 0.0004 max mem: 14168
Evaluation [1250/5000] eta: 0:03:09 time: 0.0497 data: 0.0004 max mem: 14205
Evaluation [1260/5000] eta: 0:03:09 time: 0.0496 data: 0.0004 max mem: 14243
Evaluation [1270/5000] eta: 0:03:08 time: 0.0499 data: 0.0004 max mem: 14280
Evaluation [1280/5000] eta: 0:03:08 time: 0.0505 data: 0.0004 max mem: 14318
Evaluation [1290/5000] eta: 0:03:07 time: 0.0506 data: 0.0004 max mem: 14355
Evaluation [1300/5000] eta: 0:03:07 time: 0.0538 data: 0.0004 max mem: 14393
Evaluation [1310/5000] eta: 0:03:06 time: 0.0531 data: 0.0004 max mem: 14430
Evaluation [1320/5000] eta: 0:03:06 time: 0.0493 data: 0.0004 max mem: 14467
Evaluation [1330/5000] eta: 0:03:05 time: 0.0495 data: 0.0004 max mem: 14505
Evaluation [1340/5000] eta: 0:03:05 time: 0.0495 data: 0.0003 max mem: 14542
Evaluation [1350/5000] eta: 0:03:04 time: 0.0495 data: 0.0003 max mem: 14580
Evaluation [1360/5000] eta: 0:03:04 time: 0.0495 data: 0.0004 max mem: 14617
Evaluation [1370/5000] eta: 0:03:03 time: 0.0495 data: 0.0004 max mem: 14655
Evaluation [1380/5000] eta: 0:03:02 time: 0.0495 data: 0.0003 max mem: 14692
Evaluation [1390/5000] eta: 0:03:02 time: 0.0495 data: 0.0003 max mem: 14730
Evaluation [1400/5000] eta: 0:03:01 time: 0.0494 data: 0.0003 max mem: 14767
Evaluation [1410/5000] eta: 0:03:01 time: 0.0494 data: 0.0003 max mem: 14805
Evaluation [1420/5000] eta: 0:03:00 time: 0.0494 data: 0.0003 max mem: 14842
Evaluation [1430/5000] eta: 0:03:00 time: 0.0494 data: 0.0003 max mem: 14879
Evaluation [1440/5000] eta: 0:02:59 time: 0.0494 data: 0.0003 max mem: 14917
Evaluation [1450/5000] eta: 0:02:59 time: 0.0506 data: 0.0004 max mem: 14954
Evaluation [1460/5000] eta: 0:02:58 time: 0.0506 data: 0.0004 max mem: 14992
Evaluation [1470/5000] eta: 0:02:58 time: 0.0496 data: 0.0004 max mem: 15029
Evaluation [1480/5000] eta: 0:02:57 time: 0.0495 data: 0.0004 max mem: 15067
Evaluation [1490/5000] eta: 0:02:57 time: 0.0521 data: 0.0004 max mem: 15104
Evaluation [1500/5000] eta: 0:02:56 time: 0.0521 data: 0.0004 max mem: 15142
Evaluation [1510/5000] eta: 0:02:56 time: 0.0496 data: 0.0004 max mem: 15179
Evaluation [1520/5000] eta: 0:02:55 time: 0.0495 data: 0.0004 max mem: 15216
Evaluation [1530/5000] eta: 0:02:55 time: 0.0495 data: 0.0004 max mem: 15254
Evaluation [1540/5000] eta: 0:02:54 time: 0.0495 data: 0.0004 max mem: 15291
Evaluation [1550/5000] eta: 0:02:54 time: 0.0495 data: 0.0004 max mem: 15329
Evaluation [1560/5000] eta: 0:02:53 time: 0.0495 data: 0.0003 max mem: 15366
Evaluation [1570/5000] eta: 0:02:53 time: 0.0495 data: 0.0003 max mem: 15404
Evaluation [1580/5000] eta: 0:02:52 time: 0.0495 data: 0.0004 max mem: 15441
Evaluation [1590/5000] eta: 0:02:52 time: 0.0495 data: 0.0004 max mem: 15479
Evaluation [1600/5000] eta: 0:02:51 time: 0.0495 data: 0.0003 max mem: 15516
Evaluation [1610/5000] eta: 0:02:50 time: 0.0495 data: 0.0003 max mem: 15554
Evaluation [1620/5000] eta: 0:02:50 time: 0.0494 data: 0.0003 max mem: 15591
Evaluation [1630/5000] eta: 0:02:49 time: 0.0494 data: 0.0003 max mem: 15628
Evaluation [1640/5000] eta: 0:02:49 time: 0.0494 data: 0.0003 max mem: 15666
Evaluation [1650/5000] eta: 0:02:48 time: 0.0494 data: 0.0003 max mem: 15703
Evaluation [1660/5000] eta: 0:02:48 time: 0.0497 data: 0.0004 max mem: 15741
Evaluation [1670/5000] eta: 0:02:47 time: 0.0526 data: 0.0004 max mem: 15778
Evaluation [1680/5000] eta: 0:02:47 time: 0.0546 data: 0.0005 max mem: 15816
Evaluation [1690/5000] eta: 0:02:47 time: 0.0517 data: 0.0004 max mem: 15853
Evaluation [1700/5000] eta: 0:02:46 time: 0.0496 data: 0.0003 max mem: 15891
Evaluation [1710/5000] eta: 0:02:45 time: 0.0495 data: 0.0003 max mem: 15928
Evaluation [1720/5000] eta: 0:02:45 time: 0.0513 data: 0.0004 max mem: 15965
Evaluation [1730/5000] eta: 0:02:44 time: 0.0511 data: 0.0004 max mem: 16003
Evaluation [1740/5000] eta: 0:02:44 time: 0.0496 data: 0.0004 max mem: 16040
Evaluation [1750/5000] eta: 0:02:43 time: 0.0498 data: 0.0004 max mem: 16078
Evaluation [1760/5000] eta: 0:02:43 time: 0.0499 data: 0.0004 max mem: 16115
Evaluation [1770/5000] eta: 0:02:42 time: 0.0498 data: 0.0004 max mem: 16153
Evaluation [1780/5000] eta: 0:02:42 time: 0.0498 data: 0.0003 max mem: 16190
Evaluation [1790/5000] eta: 0:02:41 time: 0.0497 data: 0.0003 max mem: 16228
Evaluation [1800/5000] eta: 0:02:41 time: 0.0497 data: 0.0003 max mem: 16265
Evaluation [1810/5000] eta: 0:02:40 time: 0.0497 data: 0.0004 max mem: 16303
Evaluation [1820/5000] eta: 0:02:40 time: 0.0496 data: 0.0004 max mem: 16340
Evaluation [1830/5000] eta: 0:02:39 time: 0.0497 data: 0.0004 max mem: 16377
Evaluation [1840/5000] eta: 0:02:39 time: 0.0496 data: 0.0003 max mem: 16415
Evaluation [1850/5000] eta: 0:02:38 time: 0.0497 data: 0.0004 max mem: 16452
Evaluation [1860/5000] eta: 0:02:38 time: 0.0497 data: 0.0004 max mem: 16490
Evaluation [1870/5000] eta: 0:02:37 time: 0.0497 data: 0.0003 max mem: 16527
Evaluation [1880/5000] eta: 0:02:37 time: 0.0497 data: 0.0003 max mem: 16565
Evaluation [1890/5000] eta: 0:02:36 time: 0.0497 data: 0.0004 max mem: 16602
Evaluation [1900/5000] eta: 0:02:36 time: 0.0497 data: 0.0004 max mem: 16640
Evaluation [1910/5000] eta: 0:02:35 time: 0.0497 data: 0.0003 max mem: 16677
Evaluation [1920/5000] eta: 0:02:35 time: 0.0496 data: 0.0003 max mem: 16714
Evaluation [1930/5000] eta: 0:02:34 time: 0.0497 data: 0.0003 max mem: 16752
Evaluation [1940/5000] eta: 0:02:34 time: 0.0497 data: 0.0003 max mem: 16789
Evaluation [1950/5000] eta: 0:02:33 time: 0.0498 data: 0.0004 max mem: 16827
Evaluation [1960/5000] eta: 0:02:33 time: 0.0497 data: 0.0004 max mem: 16864
Evaluation [1970/5000] eta: 0:02:32 time: 0.0497 data: 0.0003 max mem: 16902
Evaluation [1980/5000] eta: 0:02:32 time: 0.0497 data: 0.0003 max mem: 16939
Evaluation [1990/5000] eta: 0:02:31 time: 0.0497 data: 0.0004 max mem: 16977
Evaluation [2000/5000] eta: 0:02:31 time: 0.0497 data: 0.0003 max mem: 17014
Evaluation [2010/5000] eta: 0:02:30 time: 0.0518 data: 0.0003 max mem: 17052
Evaluation [2020/5000] eta: 0:02:30 time: 0.0517 data: 0.0003 max mem: 17089
Evaluation [2030/5000] eta: 0:02:29 time: 0.0497 data: 0.0003 max mem: 17126
Evaluation [2040/5000] eta: 0:02:29 time: 0.0498 data: 0.0003 max mem: 17164
Evaluation [2050/5000] eta: 0:02:28 time: 0.0498 data: 0.0003 max mem: 17201
Evaluation [2060/5000] eta: 0:02:28 time: 0.0498 data: 0.0003 max mem: 17239
Evaluation [2070/5000] eta: 0:02:27 time: 0.0497 data: 0.0003 max mem: 17276
Evaluation [2080/5000] eta: 0:02:27 time: 0.0497 data: 0.0003 max mem: 17314
Evaluation [2090/5000] eta: 0:02:26 time: 0.0497 data: 0.0003 max mem: 17351
Evaluation [2100/5000] eta: 0:02:25 time: 0.0497 data: 0.0004 max mem: 17389
Evaluation [2110/5000] eta: 0:02:25 time: 0.0497 data: 0.0004 max mem: 17426
Evaluation [2120/5000] eta: 0:02:25 time: 0.0540 data: 0.0004 max mem: 17463
Evaluation [2130/5000] eta: 0:02:24 time: 0.0539 data: 0.0004 max mem: 17501
Evaluation [2140/5000] eta: 0:02:24 time: 0.0497 data: 0.0004 max mem: 17538
Evaluation [2150/5000] eta: 0:02:23 time: 0.0496 data: 0.0004 max mem: 17576
Evaluation [2160/5000] eta: 0:02:23 time: 0.0509 data: 0.0004 max mem: 17613
Evaluation [2170/5000] eta: 0:02:22 time: 0.0511 data: 0.0004 max mem: 17651
Evaluation [2180/5000] eta: 0:02:22 time: 0.0498 data: 0.0004 max mem: 17688
Evaluation [2190/5000] eta: 0:02:21 time: 0.0519 data: 0.0004 max mem: 17726
Evaluation [2200/5000] eta: 0:02:21 time: 0.0526 data: 0.0004 max mem: 17763
Evaluation [2210/5000] eta: 0:02:20 time: 0.0526 data: 0.0004 max mem: 17801
Evaluation [2220/5000] eta: 0:02:20 time: 0.0517 data: 0.0004 max mem: 17838
Evaluation [2230/5000] eta: 0:02:19 time: 0.0497 data: 0.0003 max mem: 17875
Evaluation [2240/5000] eta: 0:02:19 time: 0.0514 data: 0.0004 max mem: 17913
Evaluation [2250/5000] eta: 0:02:18 time: 0.0511 data: 0.0004 max mem: 17950
Evaluation [2260/5000] eta: 0:02:18 time: 0.0498 data: 0.0003 max mem: 17988
Evaluation [2270/5000] eta: 0:02:17 time: 0.0554 data: 0.0005 max mem: 18025
Evaluation [2280/5000] eta: 0:02:17 time: 0.0551 data: 0.0005 max mem: 18063
Evaluation [2290/5000] eta: 0:02:16 time: 0.0526 data: 0.0003 max mem: 18100
Evaluation [2300/5000] eta: 0:02:16 time: 0.0512 data: 0.0003 max mem: 18138
Evaluation [2310/5000] eta: 0:02:15 time: 0.0481 data: 0.0004 max mem: 18175
Evaluation [2320/5000] eta: 0:02:15 time: 0.0498 data: 0.0004 max mem: 18212
Evaluation [2330/5000] eta: 0:02:14 time: 0.0499 data: 0.0003 max mem: 18250
Evaluation [2340/5000] eta: 0:02:14 time: 0.0498 data: 0.0004 max mem: 18287
Evaluation [2350/5000] eta: 0:02:13 time: 0.0512 data: 0.0004 max mem: 18325
Evaluation [2360/5000] eta: 0:02:13 time: 0.0511 data: 0.0004 max mem: 18362
Evaluation [2370/5000] eta: 0:02:12 time: 0.0499 data: 0.0004 max mem: 18400
Evaluation [2380/5000] eta: 0:02:12 time: 0.0499 data: 0.0004 max mem: 18437
Evaluation [2390/5000] eta: 0:02:11 time: 0.0533 data: 0.0004 max mem: 18475
Evaluation [2400/5000] eta: 0:02:11 time: 0.0517 data: 0.0004 max mem: 18512
Evaluation [2410/5000] eta: 0:02:10 time: 0.0489 data: 0.0003 max mem: 18550
Evaluation [2420/5000] eta: 0:02:10 time: 0.0505 data: 0.0003 max mem: 18587
Evaluation [2430/5000] eta: 0:02:09 time: 0.0499 data: 0.0003 max mem: 18624
Evaluation [2440/5000] eta: 0:02:09 time: 0.0499 data: 0.0004 max mem: 18662
Evaluation [2450/5000] eta: 0:02:08 time: 0.0498 data: 0.0004 max mem: 18699
Evaluation [2460/5000] eta: 0:02:08 time: 0.0499 data: 0.0004 max mem: 18737
Evaluation [2470/5000] eta: 0:02:07 time: 0.0534 data: 0.0004 max mem: 18774
Evaluation [2480/5000] eta: 0:02:07 time: 0.0517 data: 0.0004 max mem: 18812
Evaluation [2490/5000] eta: 0:02:06 time: 0.0482 data: 0.0004 max mem: 18849
Evaluation [2500/5000] eta: 0:02:06 time: 0.0499 data: 0.0004 max mem: 18887
Evaluation [2510/5000] eta: 0:02:05 time: 0.0498 data: 0.0004 max mem: 18924
Evaluation [2520/5000] eta: 0:02:05 time: 0.0510 data: 0.0004 max mem: 18961
Evaluation [2530/5000] eta: 0:02:04 time: 0.0509 data: 0.0004 max mem: 18999
Evaluation [2540/5000] eta: 0:02:04 time: 0.0499 data: 0.0004 max mem: 19036
Evaluation [2550/5000] eta: 0:02:03 time: 0.0499 data: 0.0004 max mem: 19074
Evaluation [2560/5000] eta: 0:02:03 time: 0.0498 data: 0.0004 max mem: 19111
Evaluation [2570/5000] eta: 0:02:02 time: 0.0499 data: 0.0004 max mem: 19149
Evaluation [2580/5000] eta: 0:02:02 time: 0.0512 data: 0.0004 max mem: 19186
Evaluation [2590/5000] eta: 0:02:01 time: 0.0512 data: 0.0004 max mem: 19224
Evaluation [2600/5000] eta: 0:02:01 time: 0.0498 data: 0.0004 max mem: 19261
Evaluation [2610/5000] eta: 0:02:00 time: 0.0498 data: 0.0004 max mem: 19299
Evaluation [2620/5000] eta: 0:02:00 time: 0.0499 data: 0.0004 max mem: 19336
Evaluation [2630/5000] eta: 0:01:59 time: 0.0527 data: 0.0004 max mem: 19373
Evaluation [2640/5000] eta: 0:01:59 time: 0.0504 data: 0.0004 max mem: 19411
Evaluation [2650/5000] eta: 0:01:58 time: 0.0480 data: 0.0004 max mem: 19448
Evaluation [2660/5000] eta: 0:01:58 time: 0.0501 data: 0.0003 max mem: 19486
Evaluation [2670/5000] eta: 0:01:57 time: 0.0512 data: 0.0004 max mem: 19523
Evaluation [2680/5000] eta: 0:01:57 time: 0.0511 data: 0.0004 max mem: 19561
Evaluation [2690/5000] eta: 0:01:56 time: 0.0539 data: 0.0004 max mem: 19598
Evaluation [2700/5000] eta: 0:01:56 time: 0.0539 data: 0.0005 max mem: 19636
Evaluation [2710/5000] eta: 0:01:55 time: 0.0497 data: 0.0004 max mem: 19673
Evaluation [2720/5000] eta: 0:01:55 time: 0.0498 data: 0.0003 max mem: 19710
Evaluation [2730/5000] eta: 0:01:54 time: 0.0500 data: 0.0004 max mem: 19748
Evaluation [2740/5000] eta: 0:01:54 time: 0.0499 data: 0.0004 max mem: 19785
Evaluation [2750/5000] eta: 0:01:53 time: 0.0506 data: 0.0004 max mem: 19823
Evaluation [2760/5000] eta: 0:01:53 time: 0.0505 data: 0.0003 max mem: 19860
Evaluation [2770/5000] eta: 0:01:52 time: 0.0498 data: 0.0004 max mem: 19898
Evaluation [2780/5000] eta: 0:01:52 time: 0.0499 data: 0.0004 max mem: 19935
Evaluation [2790/5000] eta: 0:01:51 time: 0.0499 data: 0.0004 max mem: 19973
Evaluation [2800/5000] eta: 0:01:51 time: 0.0500 data: 0.0004 max mem: 20010
Evaluation [2810/5000] eta: 0:01:50 time: 0.0560 data: 0.0004 max mem: 20048
Evaluation [2820/5000] eta: 0:01:50 time: 0.0577 data: 0.0005 max mem: 20085
Evaluation [2830/5000] eta: 0:01:49 time: 0.0517 data: 0.0004 max mem: 20122
Evaluation [2840/5000] eta: 0:01:49 time: 0.0499 data: 0.0004 max mem: 20160
Evaluation [2850/5000] eta: 0:01:48 time: 0.0498 data: 0.0004 max mem: 20197
Evaluation [2860/5000] eta: 0:01:48 time: 0.0498 data: 0.0004 max mem: 20235
Evaluation [2870/5000] eta: 0:01:47 time: 0.0510 data: 0.0003 max mem: 20272
Evaluation [2880/5000] eta: 0:01:47 time: 0.0523 data: 0.0004 max mem: 20310
Evaluation [2890/5000] eta: 0:01:46 time: 0.0505 data: 0.0004 max mem: 20347
Evaluation [2900/5000] eta: 0:01:46 time: 0.0492 data: 0.0004 max mem: 20384
Evaluation [2910/5000] eta: 0:01:45 time: 0.0499 data: 0.0004 max mem: 20422
Evaluation [2920/5000] eta: 0:01:45 time: 0.0499 data: 0.0004 max mem: 20459
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 23.67 GiB total capacity; 19.97 GiB already allocated; 2.75 MiB free; 22.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 465518) of binary: /home/verigle/miniconda3/envs/lavis/bin/python
Traceback (most recent call last):
Same situation here. Not only during evaluation, but also during the training process. Did you solve your problem?
Dear LAVIS team,
As part of a project, we are trying to fine-tune BLIP Retrieval with a custom dataset on 2 RTX-3090 24GB GPUs. 1) We are getting the following error, mentioned below, during the evaluation part of the
runner_base.py
code even with low validation batch sizes like 2.2) When we run evaluation with a very small subset of the validation set to bypass the CUDA error, we do not get an error but we observed that the number of batches remains the same for any validation batch size.
Thank you