torch-ash is the missing piece of collision-free extendable parallel spatial hashing for torch modules. It includes two paper's core implementations:
@article{dong2022ash,
title={ASH: A modern framework for parallel spatial hashing in 3D perception},
author={Dong, Wei and Lao, Yixing and Kaess, Michael and Koltun, Vladlen},
journal={PAMI},
year={2022},
}
@inproceedings{dong2023ash-mono,
title={Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids},
author={Dong, Wei and Choy, Chris and Loop, Charles and Zhu, Yuke and Litany, Or and Anandkumar, Anima},
booktitle={CVPR},
year={2023},
}
Note for a more user-friendly interface and further extensions, I have fully rewritten everything from scratch in this repo. Discrepancies from the reported results in the aforementioned papers are expected. Updates and more examples will come.
First, install PyTorch. Optionally install nerfacc for volume rendering.
cmake is required in the conda environment for compiling the source code.
git clone --recursive git@github.com:theNded/torch-ash.git
pip install . --verbose
ASHEngine
, a PyTorch module implementing a parallel, collision-free, dynamic hash map from coordinates (torch.IntTensor
) to indices (torch.LongTensor
). It depends on stdgpu.ASHEngine
, there are HashSet
and HashMap
which are wrappers around ASHEngine. A HashSet
maps a coordinate to a boolean value, usually used for the unique
operation. A HashMap
maps a coordinate to a (dictionary) of values, and allows fast insertion and accessing coordinate-value pairs.HashMap
, HashEmbedding
maps coordinates to embeddings and is akin to torch.nn.Embedding
.hashmap = HashMap(key_dim=3, value_dims={"color": 3, "depth": 1}, capacity=100, device=torch.device("cuda:0"))
# To insert
keys = (torch.rand(10, 3) * 100).int().cuda()
values = {"colors": torch.rand(10, 3).float().cuda(), "depth": torch.rand(10, 1).float().cuda()}
hashmap.insert(keys, values)
# To query
query_keys = (torch.rand(10, 3) * 100).int().cuda()
indices, masks = hashmap.find(query_keys)
# To enumerate
all_indices, all_values = hashmap.items(return_indices=True, return_values=True)
SparseDenseGrid
is the engine for direct/neural scene representation. It consists of sparse arrays of grids and dense arrays of cells. The idea is similar to Instant-NGP and Plenoxels, but precise sparsity is achieved through spatial initialization and collision-free hashing. Essentially it is a modern version of VoxelHashing.
It has two wrappers for coordinate transform, UnboundedSparseDenseGrid
for potentially dynamically increasing metric scenes, and BoundedSparseDenseGrid
for scenes bounded in unit cubes. Trilinear interpolation and double backward are implemented to support differentiable gradient computation. All these modules can be converted to and from state dicts by serializing the underlying hash map.
The SparseDenseGrid
does a good job without an MLP in fast reconstruction tasks (e.g. RGB-D fusion, differentiable volume rendering with a decent initialization), but with an MLP, there seem no advantages in comparison to Instant-NGP as of now. Potential extensions in this line are still in progress.
RGB-D fusion takes in posed RGB-D images and creates colorized mesh, raw and filtered. Here, depth can either be sensor depth, or generated from a monocular depth prediction model (e.g. omnidata) with calibrated scales via COLMAP. Example datasets can be downloaded at Google Drive. Instructions for custom datasets will be available soon.
These datasets are organized by
- image/ # for RGB images [jpg|png]
- depth/ # for sensor depth [optional, png]
- omni_depth/ # for learned depth generated from RGB [npy]
- depth_scales.txt # calculated between learned depth and SfM
- omni_normal/ # for learned normals generated from RGB [optional, npy]
- poses.txt
- intrinsic.txt
To run the demo,
# Unbounded scenes, sensor depth
python demo/rgbd_fusion.py --path /path/to/dataset/samples --voxel_size 0.015 --depth_type sensor
# Bounded scenes, learned depth
python demo/rgbd_fusion.py --path /path/to/dataset/samples --resolution 512 --depth_type learned
With learned depth, the fusion result is usually noisy. We can apply volume rendering to further optimize the shape:
python demo/train_scene_recon.py --path /path/to/dataset/samples --voxel_size 0.015 --depth_type learned
We start with a local 7x7x7 Gaussian filter to smooth the initialization.
Volume rendering follows the initialization. The results will be written in logs/datetime
. At every 500 iterations, mesh will be extracted and stored. The optimization will start with ripples on the surfaces, but finally converge to smooth reconstructions as shown above.
Here is a brief summary of basic usage, doc will be online soon.
We first initialize a 3D sparse-dense grid with 10000 sparse grid blocks. Each sparse grid contains a dense 8^3=512 array of cells, whose size is 0.01m.
grid = UboundedSparseDenseGrid(in_dim=3,
num_embeddings=10000,
grid_dim=16,
embedding_dims=8,
cell_size=0.01)
We then spatially initialize the grid at input points (e.g. obtained point cloud, RGB-D scans). This results in coordinates and indices that support index-based access.
with torch.no_grad():
grid_coords, cell_coords, grid_indices, cell_indices = grid.spatial_init_(points)
# [Optional] direct assignment
grid.embeddings[grid_indices, cell_indices] = attributes
As a PyTorch extension, first and second-order autodiff are enabled by a differentiable query.
optim = torch.optim.SGD(grid.parameters(), lr=1e-3)
for x, gt in batch:
optim.zero_grad()
x.requires_grad_(True)
embedding, mask = grid(x, interpolation="linear")
output = forward_fn(embedding, mask)
doutput_dx = torch.autograd.grad(
outputs=output,
inputs=x,
grad_outputs=torch.ones_like(output, requires_grad=False),
create_graph=True,
retain_graph=True)[0]
(loss_fn(output) + grad_loss_fn(doutput_dx)).backward()
optim.step()