CDB 2 determinism concerns

ccbrianf commented 10 months ago

As I understand it, vector feature attributes are stored in one table per geo-package without any other spatial sorting organization. This could lead to possible random table IO performance concerns when processing large numbers of features where the only mitigation would be to structure the number of LoDs per geo-package differently to limit attribution table size. It may be desired for certain use case profiles to at least impose spatial coherent sorting criteria on the attribution table to best trade of the efficiency of normalization versus localization. It would be helpful to know that this mitigation is feasible.
The approach to "batch optimization" of 3D cultural content (which I support in concept, but do not agree with in title. Batch optimization is an engine problem, and while spatially grouping content into a tile might force a simplistic engine to do so as a side effect, it shouldn't be the reason for doing so. That reason should rather be efficient I/O and processing of the spatial coherent content) should also consider certain profile use cases need to bound the amount of data that must be processed for deterministic memory usage and latency response. The current CDB has analogous file size limitations that cause some LoD data to be pushed finer in support of this that should also apply to the glTF tile blob approach.
The glTF tile blob approach should be able to handle existing cultural model content that supports additive LoDs and not be exclusively tile exchange. Is such support in the current design?

jerstlouis commented 10 months ago

Thanks @ccbrianf .

Sorry just recalled what was escaping me earlier -- we could require or recommend an RTree extension on the Features tables containing the attributes, if that may help. The other aspect is the LOD grouping should keep the GeoPackages below a size where the size of the table / GeoPackage becomes an issue. The ideal levelGroupingCount for a particular type of vector features (e.g., Man made 3D models) should be generally the same and could be specified in a profile. I don't expect these attribute tables to be large enough to really start causing problems. All of the 3D Bulidings as glTF 3D models, their points vectortiles and their attributes for the San Diego in a single GeoPackage (no LOD grouping at all) is only 321 megabytes 7z'ed up (3.1 GB unzipped).
I fully agree that batch optimization is an engine problem. I used the terminology that 3D Tiles have used in their B3DM, which is closer to this approach compared to CDB. In my opinion the file size limitations should not be required (but should still be in effect) if the chosen size of the tiles and the significant sizes are chosen appropriately (e.g., the 256x256 size of the GNOSISGlobaGrid vs CDB 1.x 1024x1024 should help in this regard).
No. I imagine that should be possible with some flag on the data layer and/or per tile if extension are allowed to add a columns to either gpkg_contents or the tiles table?

ccbrianf commented 10 months ago

Yes, an RTree extension might help, assuming attribute normalization isn't so much as to make that ineffective due to multiple usage locations. Grouping multiply used attribution together could help mitigate that as well. While LoD grouping can help, I feel it's not really the best solution. There are likely many use cases that desire such performance but also don't want separate geo-package files per feature type, for instance. Thus, the grouping changes would be somewhat artificial to impact performance/determinism versus desired content structure. Testing should help to define the size of the concern, if any, but we want to make sure to keep CDB near infinitely scalable under at least a profile. Keep in mind the use case of a 700 knot fast jet flying low to the ground needing to load content on demand with minimal latency.
I concur that the goal of choosing LoD versus tile size should be to minimize chunk size overflow. That was unfortunately off by a level or two in the original CDB. My concern is more with cultural models than imagery. In fact, I like 1k imagery slightly better than 256 for optimal I/O caching (but 256 for runtime paging from that cache).
I was thinking we need an LoD extension inside the glTF blob for additive LoDs.

jerstlouis commented 10 months ago

but we want to make sure to keep CDB near infinitely scalable under at least a profile. Keep in mind the use case of a 700 knot fast jet flying low to the ground needing to load content on demand with minimal latency.

Of course!

I like 1k imagery slightly better than 256 for optimal I/O caching (but 256 for runtime paging from that cache

Because displayed tiles are never exactly on a level, we will always load the next level up (512x512) and blend between levels to have a smooth transition between the two (including for vector features draped on the terrain that should smoothly transition). So when doing that, 256 x 256 might be better.

I was thinking we need an LoD extension inside the glTF blob for additive LoDs.

I am really hoping everything can be achieved using plain glTF 2.0 binary, without any glTF extensions. My own opinion is that glTF and other 3D model formats define geometry, bones & skins, animations, and functionality such as geo-referencing, LODs, etc. belongs outside of the 3D model format.

opengeospatial / CDBV2-2023-Summer-Workshop

CDB 2 determinism concerns #3