Closed JunyuanDeng closed 6 months ago
Thank you for your interest in TorchSparse. Can you provide more information about your wordload? For example, what is the input resolution and batch sizes?
Is there an existing issue for this?
- [x] I have searched the existing issues
Current Behavior
I write the following encoder
class encoder(nn.Module): def __init__(self): super(Spencoder_gps, self).__init__() self.conv1 = nn.Sequential( spnn.Conv3d(3, 32, 5, stride=2, padding=2, bias=False), spnn.BatchNorm(32), spnn.ReLU(), ResidualBlock(32, 32), ResidualBlock(32, 32) ) self.conv2 = nn.Sequential( spnn.Conv3d(32, 48, 3, stride=2, padding=1, bias=False), spnn.BatchNorm(48), spnn.ReLU(), ResidualBlock(48, 48), ResidualBlock(48, 48)) self.conv3 = nn.Sequential( spnn.Conv3d(48, 64, 3, stride=[1, 2, 2], padding=1, bias=False), spnn.BatchNorm(64), spnn.ReLU(), ResidualBlock(64, 64), )
For the input, I write a dense to sparse function:
def densetosparse(mask, img, bounds): # mask : [B,T,W,H] # img : [B,3,T,W,H] # grid : [3,T*W*H] # grid_xx : [1,T*W*H] B = img.shape[0] coord = torch.argwhere(mask).type(torch.int32) features = img[coord[:, 0], :, coord[:, 1], coord[:, 2], coord[:, 3]] sp_inputs = torchsparse.SparseTensor(features, coord.contiguous(), spatial_range=(6, 16, 1024, 1024)) return sp_inputs
The first iteration was good, but for the second iteration failed with out of memory:
File "/mnt/local_disk/djy/Forward4D_query_gaussian_sicong_pure3d_0125/models/image_upsample/UpsampleImage.py", line 176, in forward x_conv1 = self.conv1(inputs) File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward input = module(input) File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torchsparse-2.1.0-py3.10-linux-x86_64.egg/torchsparse/nn/modules/conv.py", line 98, in forward return F.conv3d( File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torchsparse-2.1.0-py3.10-linux-x86_64.egg/torchsparse/nn/functional/conv/conv.py", line 92, in conv3d kmap = F.build_kernel_map( File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torchsparse-2.1.0-py3.10-linux-x86_64.egg/torchsparse/nn/functional/conv/kmap/build_kmap.py", line 85, in build_kernel_map kmap = build_kmap_implicit_GEMM_hashmap_on_the_fly( File "/home/shaper/miniconda3/envs/forward4d/lib/python3.10/site-packages/torchsparse-2.1.0-py3.10-linux-x86_64.egg/torchsparse/nn/functional/conv/kmap/func/hashmap_on_the_fly.py", line 72, in build_kmap_implicit_GEMM_hashmap_on_the_fly out = func( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.79 GiB. GPU 0 has a total capacty of 47.54 GiB of which 6.17 GiB is free. Including non-PyTorch memory, this process has 41.36 GiB memory in use. Of the allocated memory 24.22 GiB is allocated by PyTorch, and 16.28 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Expected Behavior
No more OOM problem.
Environment
- GCC: - NVCC: - PyTorch: - PyTorch CUDA: - TorchSparse:
Anything else?
No response
If you suspect an issue with your sparse_to_dense function, consider trying mine. It's proven effective in my program and doesn't consume additional memory. In your scenario, you can pre-multiply x by the mask before invoking the function. Alternatively, you could attempt memory cleanup after each training round.
def from_dense(x: torch.Tensor):
"""create sparse tensor fron channel last dense tensor by to_sparse
x must be BTHWC tensor, channel last
"""
sparse_data = x.to_sparse(x.ndim-1)
spatial_shape = sparse_data.shape[:-1]
sparse_indices = sparse_data.indices().transpose(1, 0).contiguous().int()
sparse_feature = sparse_data.values()
return SparseTensor(feats=sparse_feature.cuda(), coords=sparse_indices.cuda(), spatial_range=spatial_shape)
torch.cuda.empty_cache()
Is there an existing issue for this?
Current Behavior
I write the following encoder
For the input, I write a dense to sparse function:
The first iteration was good, but for the second iteration failed with out of memory:
Expected Behavior
No more OOM problem.
Environment
Anything else?
No response