pygod-team / pygod

A Python Library for Graph Outlier Detection (Anomaly Detection)
https://pygod.org
BSD 2-Clause "Simplified" License
1.31k stars 127 forks source link

Out of memory #72

Closed aNR0 closed 1 year ago

aNR0 commented 1 year ago

Describe the bug Hi, except that with the GCNAE model, I keep running into out of memory issues with the other models, even when setting the batch size to a very low value. It's always around 600GBs for a batch with around 400k nodes.

RuntimeError                              Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_17284\2902826942.py in <module>
      5 
      6 model = AnomalyDAE(gpu=0, batch_size=8, verbose=True, contamination=0.05)
----> 7 model.fit(batch)

~\anaconda3\lib\site-packages\pygod\models\anomalydae.py in fit(self, G, y_true)
    143         """
    144         G.node_idx = torch.arange(G.x.shape[0])
--> 145         G.s = to_dense_adj(G.edge_index)[0]
    146 
    147         # automated balancing by std

~\anaconda3\lib\site-packages\torch_geometric\utils\to_dense_adj.py in to_dense_adj(edge_index, batch, edge_attr, max_num_nodes)
     46     size = [batch_size, max_num_nodes, max_num_nodes]
     47     size += list(edge_attr.size())[1:]
---> 48     adj = torch.zeros(size, dtype=edge_attr.dtype, device=edge_index.device)
     49 
     50     flattened_size = batch_size * max_num_nodes * max_num_nodes

RuntimeError: CUDA out of memory. Tried to allocate 597.53 GiB (GPU 0; 16.00 GiB total capacity; 1.32 GiB already allocated; 12.78 GiB free; 1.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
kayzliu commented 1 year ago

This may be due to a neighbor explosion. You can try reducing the number of layers to reduce the computation subgraph for each batch.