Open zqh0253 opened 2 years ago
Hi, thanks for the great work!
I run the code using 8 A100 GPU cards and find that the gpu memory consumption is extremely large at the first several ticks! Here is the output log:
tick 0 kimg 0.2 time 1m 31s sec/tick 21.8 sec/kimg 113.36 maintenance 69.7 cpumem 4.70 gpumem 67.21 augment 0.000 Evaluating metrics for 3sky_timelapse_256_stylegan-v_random3_max32_3-4468dd1 ... {"results": {"fvd2048_16f": 992.2131880075198}, "metric": "fvd2048_16f", "total_time": 80.59011363983154, "total_time_str": "1m 21s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261186.965401} {"results": {"fvd2048_128f": 1764.0538105755193}, "metric": "fvd2048_128f", "total_time": 230.15506172180176, "total_time_str": "3m 50s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261417.1899228} {"results": {"fvd2048_128f_subsample8f": 1241.4737946211158}, "metric": "fvd2048_128f_subsample8f", "total_time": 54.82384514808655, "total_time_str": "55s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261472.0662923} {"results": {"fid50k_full": 381.6109859044359}, "metric": "fid50k_full", "total_time": 83.23335003852844, "total_time_str": "1m 23s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261555.3598747} tick 1 kimg 5.4 time 11m 28s sec/tick 23.0 sec/kimg 4.43 maintenance 573.9 cpumem 12.42 gpumem 69.14 augment 0.000 tick 2 kimg 10.6 time 11m 50s sec/tick 21.4 sec/kimg 4.13 maintenance 0.0 cpumem 12.42 gpumem 10.26 augment 0.000 tick 3 kimg 15.7 time 12m 11s sec/tick 21.6 sec/kimg 4.17 maintenance 0.0 cpumem 12.42 gpumem 10.26 augment 0.000 tick 4 kimg 20.9 time 12m 33s sec/tick 21.4 sec/kimg 4.12 maintenance 0.0 cpumem 12.42 gpumem 10.26 augment 0.003 tick 5 kimg 26.1 time 12m 55s sec/tick 21.7 sec/kimg 4.18 maintenance 0.0 cpumem 12.42 gpumem 10.29 augment 0.010 tick 6 kimg 31.3 time 13m 16s sec/tick 21.9 sec/kimg 4.22 maintenance 0.0 cpumem 12.42 gpumem 10.29 augment 0.026 tick 7 kimg 36.5 time 13m 39s sec/tick 22.4 sec/kimg 4.32 maintenance 0.0 cpumem 12.42 gpumem 10.33 augment 0.038 tick 8 kimg 41.7 time 14m 00s sec/tick 21.6 sec/kimg 4.16 maintenance 0.1 cpumem 12.42 gpumem 10.33 augment 0.036 tick 9 kimg 46.8 time 14m 23s sec/tick 22.3 sec/kimg 4.30 maintenance 0.1 cpumem 12.42 gpumem 10.32 augment 0.038 tick 10 kimg 52.0 time 14m 44s sec/tick 21.4 sec/kimg 4.13 maintenance 0.0 cpumem 12.42 gpumem 10.33 augment 0.028
As you can see, gpumem in the first two ticks is abnormal. Do you have any idea about this problem?
gpumem
I am experiencing the same problem. For a single forward pass, it consumes 46GB VRAM. Inference takes less than 8GB. What is the solution to this?
Hi, thanks for the great work!
I run the code using 8 A100 GPU cards and find that the gpu memory consumption is extremely large at the first several ticks! Here is the output log:
As you can see,
gpumem
in the first two ticks is abnormal. Do you have any idea about this problem?