marcojira / fld

PyTorch code for FLD (Feature Likelihood Divergence), FID, KID, Precision, Recall, etc. using DINOv2, InceptionV3, CLIP, etc.
38 stars 7 forks source link

CUDA out of memory #8

Open mapengsen opened 3 months ago

mapengsen commented 3 months ago

When the dataset is larger,raise error:

Datasets files num is: 1344726 Datasets path is: /root/autodl-tmp/MDT/log_chekpoints/sampleImage/2024_05_10_9
Datasets files num is: 50000 Traceback (most recent call last):
File "evaluations/fld/eval_image.py", line 81, in main() File "evaluations/fld/eval_image.py", line 54, in main Precision_value = PrecisionRecall(mode="Precision").compute_metric(train_feat, None, gen_feat) # Default precision File "/root/autodl-tmp/MDT/evaluations/fld/fld/metrics/PrecisionRecall.py", line 57, in compute_metric return self.pct_in_manifold(gen_feat, train_feat).item() File "/root/autodl-tmp/MDT/evaluations/fld/fld/metrics/PrecisionRecall.py", line 33, in pct_in_manifold nn_dists = self.get_nn_dists(manifold_feat) File "/root/autodl-tmp/MDT/evaluations/fld/fld/metrics/PrecisionRecall.py", line 24, in get_nn_dists curr_dists = torch.cdist(feat[start:end], feat) File "/root/miniconda3/envs/MDT/lib/python3.8/site-packages/torch/functional.py", line 1315, in cdist return _VF.cdist(x1, x2, p, None) # type: ignore[attr-defined] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.10 GiB. GPU 0 has a total capacty of 23.70 GiB of which 996.56 MiB is free. Process 148558 has 22.72 GiB memory in use. Of the allocated memory 21.07 GiB is allocated by PyTorch, and 216.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

marcojira commented 3 months ago

For precision, the distance computation is batched for the gen_feat but not for the train_feat. Does it work if you take a subset of your train_feat?

mapengsen commented 3 months ago

Now, why did I end up with recall being 0? Is this normal

marcojira commented 3 months ago

That would be unlikely unless your generated data has very low variance or is very out of distribution.