Open hobu opened 3 weeks ago
Storing Gaussian splats in tabular format should work, but is there any fair benchmark for compression of Gaussian splats?
is there any fair benchmark for compression of Gaussian splats?
With the 3DGS compression survey, we aim to establish a standardized benchmark for evaluating the compression of 3D Gaussian splatting. In the survey, we currently compare more than a dozen 3DGS compression methods from the scientific literature. The question "is there any fair benchmark" is nuanced, as you might expect, so allow me to elaborate for a bit.
3DGS compression
The definition of the problem to solve - which is usually not explicitly stated - is to create the smallest representation from your training views yielding the highest quality of novel renders for the testing views. I.e. maximizing quality metrics like PSNR and minimizing metrics like LPIPS for the test views, while keeping the representation you need to render as small as possible, in Bytes on disk. The quantization and encoding of splats (as proposed here with .spz) cover only a small part of the problem of this definition of 3DGS compression. You can, for example, throw away about a third of the splats in original 3DGS (the ones with the lowest opacity) of a finished scene, without a reduction in PSNR. Boom, 33% size reduction. You can do better sampling and densification, to reduce the number of splats considerably during training. We list methods that work on this problem as "Compaction" methods in the survey. Methods like Mini-Splatting reduce the number of Gaussians to ~1/10th of the ones needed in 3DGS. What isn't there, you don't have to encode, making for a very compact representation off the bat, no matter your file format. So it's possible to compare how you would be able to compress original 3DGS ply files. But it's not very useful, as no file format will be able to achieve the high rates of compression (like using 90% less splats) with the non-optimal 3DGS configuration, like which scenes to use, which views to use for testing, which resolution to use for evaluation.
Datatsets
The scenes used in the original 3DGS paper have become the de-facto standard: train & truck from Tanks & Temples, Deep Blending, Mip-Nerf 360 (+extra scenes), and Synthetic NeRF (Blender). The original authors of 3DGS unintentionally created a benchmark set for comparison, as subsequent papers compare their numbers with the 3DGS table. You could introduce additional datasets, but you'd lose the ability to compare with most methods now out there (as running them is quite involved, and other datasets require different data loading and parameter optimization, ...). The scenes range from tiny synthetic environments with heavy view-dependent effects to small to mid-sized natural ones. They cover a range of natural scenes (rooms and outside views), but notably lack examples of larger-scale scenes. There is a bunch of implicit assumptions in the original 3DGS paper about evaluation that we are making explicit with 3DGS compression testing conventions.
Structured representation
Additionally, all the papers working on 3DGS compression (or "Compact 3DGS") usually mix compaction (number of splat reduction) with the file format or representation they're introducing. You really have to do both, to do well in the benchmark. Ok, you may say, then let's build a comparison on the same sampling/densification/compaction strategy, and only compare the file formats, to have a fair comparison. Which brings us to the next point: the representation of splats is a crucial part, and is usually tied to the file format. You can have codebooks, octrees, hash-grids (from Instant-NGP), MLPs, anchor points, self-organizing grids, and others. The best-performing methods have a custom structured representation, that requires a custom encoding, not just storing the splats as a list. So just comparing the compression of ply to something exclude many effective strategies of coding this high-dimensional data.
Optimization during training
Another effective strategy is to optimize your representation during the training of the scene. Some methods can do both - apply compression completely post-training, or optimize during training. But I believe all methods gain something from optimizing the representation during training. As stated, we care about the ability to render novel views, usually you don't really care about a certain 3DGS configuration. If we can move splats around and change them to have a more compact representation, we should do that. This makes the problem different from e.g. image coding. In image coding, we want to store pixels to be perceptually as close to the original as possible. In 3DGS, we want the rendered novel views to be perceptually as close as possible to our test views. We don't at all care how you place your splats.
Fair benchmark
So, you certainly could look at some ply to X compression, as a benchmark. E.g., using the original 3DGS scenes that are available pre-trained for download, and looking how small you can get them with your favorite tabular format. But looking at ply-to-X in isolation - ignoring better compaction strategies, better structuring and organization of Gaussians, optimizations during training - leaves out the most effective strategies for compact representations out there.
The 3DGS compression survey is open, it's hosted as a GitHub project. We encourage anybody to submit their results, the expected structure is documented. File formats like .spz or Apache Feather or Parquest might make for a fine representation of splats. But to make a competitive entry, you will need to combine the file format with better representations/strategies.
@w-m I came across your paper last month, but only recently discovered your notable website.
With the increasing number of 3DGS papers, many only compare their algorithms to a limited set of methods. Here’s what I’d like to know:
This is why I would like to see a fair and comprehensive benchmark.
Your 3DGS.zip
benchmark is well-designed, covering quality, total size, point size, and concise summaries of each method. Once I'm ready to release my model, I'd like to add my results to assess its advantages. Your setup is well-suited for conducting ablation studies.
It's possible to compare how you would be able to compress original 3DGS ply files. But it's not very useful.
This represents the post-processing approach.
The representation of splats is a crucial part, and is usually tied to the file format.
This highlights representation improvements, and yes, files are often closely linked to their code.
Another effective strategy is to optimize your representation during the training of the scene.
This focuses on the training-based approach.
From what I’ve learned so far, achieving optimal scene compression with quality and size depends on advances in representation (grids), training (encoding, pruning), and post-processing (quantization).
@AsherJingkongChen we put an update of the paper version of the survey on arXiv today: https://arxiv.org/pdf/2407.09510
I think you will like this version, as it adds some of the points you mentioned. It now includes a section on the key methods for compression, before discussing the individual papers. We also added discussion on the correlation between attributes (coordinates, colors, scales, ...), and show the typical distribution of values for the attributes.
The side effects of each method/tool are very hard to discuss in isolation. As all the 3DGS compression papers present methods that do several things at once (structured representation, training-based approaches, post-processing, coding & quantization), we can't say what each of these individual tools do and their direct effects are. This would require to have the code of all methods in a shared code base, where you can plug in different tools and measure the results. That is currently just not available, so we try to explain what toolset exists, and show the paper's results. With the survey, we hope that people will find interesting compression methods from the table & plots, and will be able to adapt and combine their ideas and tools into improved methods.
Getting too deep into the survey is off-topic here. But we are very happy to receive feedback. Please reach out via email to one of the authors, or on the issue tracker in https://github.com/w-m/3dgs-compression-survey with critique and ideas for improvement.
As for a fair comparison of individual tools (e.g. coding formats) in isolation, I think the best bet we have is to gather the different implementations in gsplat. The we could try different combinations under different parameters.
To compare .spz with other 3DGS formats, a Python interface or Python bindings would certainly help.
@w-m Your survey surely helps, I will look into the paper this month. I may address some problems you mentioned as well.
Cross-linking this standardization discussion from the mkkellog viewer repo, regarding the need for a common format standard for anyone interested.
Looking forward to the developments of this discussion as well, since the compression mechanisms used and described within the nianticlabs/spz repo and blog look indeed a lot like some of the ideas and hacks mentioned in the above discussion, which themselves indeed resemble parquet a bit for someone looking at it from afar - column-based formats, careful selection of adapted bit-packing and quantization techniques depending on the underlying prop encoded, standard compression schemes etc.
Note that another discussion could emerge concerning tiling/streaming mechanisms rather than compression, akin to OGC 3d-tiles for pointclouds/massive meshes, or at some point gaussian-splats/gaussian clouds? That column packing could be done per "tile", to allow for streaming of portions of the whole splats scene while still allowing compression gain with similar data memory-contiguity. The main hurdle to simple tree-based (octree & co) tiling being the volumetric-extent of each splat vs points - but this is already what happens with tiling meshes, so could be replicated.
Totally fine if the answer to this question is "Because it is what we use", but I wonder if the capabilities of SPZ could be covered by standing upon something like Parquet or Apache Feather.
My case for leveraging either of those would be:
Thanks for publishing the code, and I appreciate the desire to bootstrap some interoperability for splat data, but I wonder if it is possible to encode the same schema of information in Feather/Parquet and get ecosystem interoperability without requiring custom software implementations.