sofwerx / cdb2-concept

CDB modernization
0 stars 1 forks source link

LoDs, rules, and performance #3

Open cnreediii opened 4 years ago

cnreediii commented 4 years ago

Vaughn Whisker

I'm trying to catch up on some of the things I missed last week bouncing in and out of the meeting. I'm looking at the LOD table (Slide 55) in the Presagis Brief "OGC CDB Primer" that Hermann sent to me. Looks like there are limitations on the number of vertices in LODs specified in CDB. We didn't really cover this in the 3D model group, but photogrammetry models typically have WAY more vertices than the allotted amounts. What should we do there?

Tracey Birch

I think this is one of those historical requirements from "old" CDB that may no longer be relevant as technology evolves.

Ryan Franz

The idea is still relevant. If you are rendering for real-time graphics, it is good to know how much "stuff" you are rendering. And it is a good idea for a model to have multiple LODs that have a roughly 4:1 ratio of additional content per new LOD (more than 4:1 implies to me that the model needs additional LODs to keep the geometry change from being distracting). I would like to keep some proxy for the complexity of a model, whether that is vertices or something else. And putting this information in an attribute or elsewhere that doesn't require loading a potentially large file before determining that we have the latency and capacity to display this model in our 60 to 120 Hz environment (16ms to 8ms per frame).

Does it need to be restricted, or just informative metadata somewhere? Or is there some way to additionally segment 3d to meet this need without it being tied to LODs?

Greg Peele 2:37 PM

for weapon fire modeling I typically use models that may have tens of thousands of vertices or more, but it's for an entire building whereas a single tree model might only have a couple hundred. but there's also the separate problem how many models are in the scene. the concept of triangle limits (more so than vertex limits, although the two are related) does have merit for entire terrain meshes even more so than individual models. (edited)

when building an integrated mesh from raster and vector inputs, tow of the constraint parameters is typically maximum geometric error and maximum triangle count.

I'm not sure if individual model limits make sense, thinking about it, since models can vary substantially in surface area / volume and model instance count. it's really more of a geographic density issue? (edited)

i.e. knowing that a particular view frustum has a particular upper bound on content size?

Brian Ford FlightSafety 2:46 PM

If we choose to ignore rather instead of revise (if necessary) these determinism requirements, then we are choosing to make CDB no longer suitable for real-time guaranteed frame rate simulation, or at least we are choosing that for any models that greatly exceed these limits, assuming there is still attribution available as Ryan says to detect it. Forgoing this might be suitable for interactive variable frame rate simulation, but even then hand held and other constrained edge devices may suffer some depending on their application domain.Yes, this information is used not only when building a 2.5 D terrain mesh to a particular geographic content density, but also before doing so to assure you have the processing power necessary to accomplish that in the time available. Having every model have an LOD available with <= 128 vertices assures that there is some version of that model that almost all devices can display at the resolution where that model can actually be detected by a particular sensor. (edited)

Greg Peele 2:54 PM

I guess the question I have is whether the current approach guarantees determinism in framerate if it is on a per-model basis rather than a per tile basis? per-model basis would guarantee determinism is load time which I can see the merit of in a streaming realtime environment.

this may just be my ignorance of practical use of CDB, is there a maximum limit on how many model instances may be in a particular tile at a particular LOD too?

Ryan Franz 3:04 PM

Point vector files are limited to 16k features, so that is the practical limit of model instances. Of course, there is more than one point vector file that can be used in CDB 1.x

Kevin Bentley 3:07 PM

I think this is why we need a main vector file where vectors can be stored in their 'natural' state. And then the tiled version can be split up/decimated/etc.

I think that the tiled vector data would have to be a derived layer in most use cases.

Unless the CDB wasn't ever going to be used for any use case other than visual.

Brian Ford FlightSafety 3:17 PM

The frame rate determinism is up to the client, but it needs to have enough metadata and versions of the model available to do so (most likely on a per tile basis, as you suggest). Something similar but different will be necessary for 3D terrain models having integrated culture as a single surface.Yes, there is a number of points per Shapefile LOD limit. The client chooses which LOD to use for each feature layer when creating its own (often tiled) paging mechanism. There is no assumption that every layer will use the same LOD.We're mostly talking about point features here, so you are just saying you don't want any tile divisions (and no determisim bounds on processing that data, even if spatial indices are available to help with part of the problem). We're also not talking just visual, but any sensor such as radar, thermal IR, EO, and in some cases SAF, etc. (edited)

Ryan Franz 3:24 PM

I still remain in the "vectors need some minimal amount of tiling" camp, but that said, it would not be hard to take a huge amount of repository vector data and tile it for a simulation profile. Just takes time (maybe a lot of it). I guess it depends on if there is still a rapid mission rehearsal use case for SOCOM and how long that process might take. If the continental US has 1 million roads and 12 million buildings (numbers I have seen recently) and untold amounts of vegetation, how long would the tiling process take before a mission rehearsal could start? If that is fast enough, then great!

Brian Ford FlightSafety 3:58 PM

Don't vectors have a natural state that includes generalized versions for map-scale? Tiling aside for the moment (but I expect we won't be able to fully get rid of it), isn't there a way to at least align CDB's vector LOD scheme with that? @Kevin Bentley, you said you once did a crosswalk between the concepts of vector map-scale and modsim vector LOD needs. Is there any way you could make that available to educate those of us without that cartographic understanding?

Holly Black 4:14 PM

Vector map-scale in relation to CDB LODs is something that has been discussed/desired many times in the past. This relationship is more logical and understandable from an end-user perspective.

Brian Ford FlightSafety 4:21 PM

3.2 Update 1 / OGC 1.2 tried to well define vector network priority and LOD spatial resolution, but I don't think there are many tools or datasets conforming, and there was still a vertex limit, so we realize it's not 1 for 1 map-scale but similar.

Jay Freeman 7:26 AM

I have been out of the loop for a few days on vacations ... my general thought reading the discussions is ....From a pure technical point of view, the biggest friction point of CDB has always been “I like my data like this and CDB does it like that”. A lot of our conversation last week was debating the merits of how data should be structured to meet our individual perceptions of the use cases. When I think about data, there are really two states — raw and derivative. I am sure there are better words than raw and derivative, but for this conversation these words are clear enough. These derivative states of data are strongly correlated to use cases.IMO — the best answer is to agree on a raw state of data as an interchange concept and a self-describing ruleset of derivative states targeting use cases. I propose 3 preliminary self-describing rulesets (ie use cases) targeting: ATAK (constrained resources) Mission Command (analytics) Flight simulation (unconstrained resources)Examples, in constrained resources/ATAK use case, I want a "prebaked cake" with as small of a disk space utilization as possible. The Mission Command/analytics use case, I want data is a state where it is has maximum topology/relationships between data objects. In an unconstrained resources/flight simulation world, I would want something like CDB of today -- deterministically tiled with convenient LODs.These self describing rulesets must be an additive and vendor agnostic concept. As in, any vendor could read the ruleset and "add" to the raw data the derive structures, mappings, and concepts defined in the ruleset. Consumers could always exchange the raw + derived data; however, the raw data is defined as the "standard" and interchange. The derivative data is always something that can be computed if needed from the ruleset and raw data.These rulesets would contain things like CRS, tiling concepts, LODs, taxonomy mapping and do I dare even say, what standards are used (e.g. 3D Tiles, GeoPackage, etc.). If I let my mind run wild, these rulesets would also contain the "rules" on procedural generation. Obviously others could create a ruleset to define other use cases, such as for Unity or other systems.The ruleset concept aligns to a "profile" in OGC language. An OGC profile is "A collection of standards, with parameters, options, classes, or subsets, necessary for building a complete computer system, application, or function. An implementation case of a more general standard or set of standards."The upside is this would avoid a tight coupling of “CDB 2.0” to technology constraints that might change over (ie verts per LOD) and lets multiple use cases be expressed. The downside is this a more theoretical approach requiring a very flexible rule driven implementation to create the derivative data. (edited)

Brian Ford FlightSafety 9:22 AM

I'm fine with Jay's approach, but I would say the down side is plug and play interoperability without reformatting a CDB dataset, at least between profiles. The number of profiles we allow determines the magnitude of that problem. We don't have that problem today, but we may also have a standard not well suited for particular use cases that we now want to include. If we can't come to consensus about a best common compromise, then forking into a (hopefully small) set of profiles with a large shared base seems like the next best option.I personally hold that things like LOD options with low and progressive vertex limits, as well as bounded, deterministic file sizes, benefit every use case except in some cases editing, and I think they can be moderately timeless if well chosen, but I know that opinion is not necessarily shared.

Carl Reed 3:37 PM

A note on LoDs etc. I will be moving that discussion to a Git Issue. Not quite sure how to label the issue but it will have LoD in the title. Also, FYI, I3S supports LoDs as well as node splitting (node is sort of like a tile but better known as defined by a minimum bounding sphere). From the Standard. "The concept of Level of Detail (LoD) is intrinsic to the I3S standard. Scene Layers may include levels of detail that apply to the layer as whole and serve to generalize or summarize information for the layer, similar to image pyramids and also similar to raster and vector tiling schemes. A node in the I3S scene layer tree could be considered the analog of a tile in a raster or vector tiling scheme. Scene layers support levels of detail in a manner that preserves the identity of the individual features that are retained within any level of detail. The I3S Level of Detail model covers several use cases, including, splitting up very heavy features such as detailed building or very large features (coastlines, rivers, infrastructure), thinning/clustering for optimized visualization as well as support for representing externally authored multiple LoDs." And on node splitting (subdividing a "tile"): "The FeatureData, Geometry, Texture and Attribute resources can be split into bundles for optimal network transfer and client-side reactivity. This allows balancing between index size, feature splitting (with a relatively large node capacity between 1MB and 10MB) and optimal network usage (with a smaller bundle size, usually in the range of 64kB to 512kB)." And l;et's not forget 3D Tiles, which has similar concepts for the same reason: "The foundation of 3D Tiles is a spatial data structure that enables Hierarchical Level of Detail (HLOD) so only visible tiles are streamed - and only those tiles which are most important for a given 3D view." And other geo related standards specify use of LoDs. So, this is my long winded way of saying I support LoDs AND that prescriptive rules - perhaps as a profile - should be allowed to enable higher performance based on specific use cases.

jerstlouis commented 4 years ago

I strongly support the LOD use case for everything, including vector data, and I think even editing can benefit from it to do partial updates on updating e.g. a highly detailed continuous coastline vector feature of a whole continent.

But the customizable layout proposed on the Miro board using GeoPackages though would easily support both tiled and not tiled use cases / preferences:

ryanfranz commented 4 years ago

I think we are somewhat conflating several uses of the term Level of Detail. It might be easier to break out different use cases and consider the rules for each case. These are the cases that I can see in the current CDB:

  1. LODs on tiled raster data (pyramids or mipmaps)
  2. LODs on tiled vector data (similar to pyramids for rasters, maybe only present in the CDB 2 performance use case)
  3. LODs on individual models (3D content load management, based on the size of the model when the LOD initially turns on the coarsest version, or the geometric change/error when an LOD refines an existing model. Significant Size in CDB 1.x)
  4. LODs on tiles of models (given a GeoSpecific (or non-generic) model with a set of LODs, if we place that model within the tiled structure, which tile LODs do each of the model LODs belong within)
  5. LODs on model textures (CDB 1.x also places textures only used on a single GeoSpecific (non-generic) building within the tiled structure as well. There are probably better ways to handle this in CDB 2)

Most of the thread above is focused on case 3, but some comments get into cases 2 and 4. We need to make sure the rules and discussion are focused on the correct use case of LODs.

jerstlouis commented 4 years ago

@ryanfranz Thank you for this very useful categorization of the use cases.

In CDB 1.x, aren't most if not all of these closely related?

For 4., is that the points tiles that reference the models? (which is kind of special case of 2 with points vector). For 3., am I correct in understanding that those model LODs are stored in separate OpenFlight files?

I am thinking in the case that CDB 2.x supports e.g. a simple glTF geospecific model for a given tile batching all buildings of that tile (with either potential vertex attribute or a nodes hierarchy to support clamping to terrain, or hardcoded terrain elevation), but supporting individual feature attributions (mapping triangular faces to feature IDs), the use cases 3 & 4 (and perhaps also 5) would blend together. Those LOD might also match the raster & vector LODs of us cases 1 & 2.

ryanfranz commented 4 years ago

@jerstlouis They are related, mostly because the CDB tiling scheme (which is based on case 1) is used to store all the others. So there is some type of explicit mapping from an LOD in the other cases (except 3) into a case 1 tile LOD. For example, case 2 with vectors is based on the significant size of the feature, and then one would look up the table and determine which raster LOD it belonged within. Case 4 and 5 are similar, with tables to help match up their measurement.

The point vectors that reference the model is in case 2. Case 4 is how the model's LODs are stored in the tiles.

You are correct about these model LODs stored in separate OpenFlight files. Also, all of the model LODs that fall within a tile are zipped together. I believe that the original mapping that placed the OpenFlight model LODs in the tile structure was off by two levels, which has caused problems with these zip files being much larger than originally anticipated.

For completeness, here are the mapping tables in Volume 1 of CDB 1.1: Raster resolution (case 1) into CDB tiles: Table 2-4, based on approximate texel size at the equator Model LOD (case 4) into CDB tiles: Table 3-1, second column, based on significant size of the model's individual LODs Texture LOD/mip (case 5) into CDB tiles: Table 3-1, third column, based on the real world size of the texel after mapped onto model polygons Model Signature LOD (case 6 that I missed) into CDB tiles: Table 3-1, fourth column, based on a bounding sphere diameter. This might be nice to move to an extension Vector LOD (case 2) into CDB tiles: Table 3-27, based on the average point density or vertex count

jerstlouis commented 4 years ago

@ryanfranz Thanks for the clarifications.

Those zip files of geospecific OpenFlight models is what I suggest each one could be replaced by a single batched 3D model (e.g. binary glTF file), ready to upload to the GPU and render (rather than having to decompress and load tons of tiny models), but preserving the attribution of individual features (as if they were individual models, and potentially even finer attribution) and the ability to clamp the models to the terrain.

ryanfranz commented 4 years ago

There are always tradeoffs in creating data geared toward visualization alone. Creating a single batched 3D model is likely better performance, but at a loss of flexibility.

So I think we would need to address the assumptions up front on how this is done. Here are the assumptions that I think would have to be made (for the CDB performance case):

So, for the CDB repository case, this doesn't seem interesting, as they will want to edit/refine these models. For a CDB edge case, absolutely this would be helpful. Especially when the assumptions about how the client best handles the data are known in advance. For the CDB performance case, I have a hard time with losing the flexibility.

Note: I am not really a fan of the current tiled zip file structure, so don't read this as a defense of how CDB 1.x stores GeoSpecific models. But we do make use of this flexibility in our use of CDB

kevinbentley commented 4 years ago

@ryanfranz I have the same concerns about glTF having enough attribution to replace OpenFlight. OpenFlight can store so much more than glTF currently supports. I do support the idea of creating extensions to glTF to make it more M&S ready. What I'm not sure about is how common it is to use some of the flt record types like sounds, heat maps, etc. In other words, how complex would extensions need to be to support most (90%?) of the CDB users? I don't personally know the answer.

jerstlouis commented 4 years ago

@ryanfranz I fully agree with not losing the flexibility, and what I was considering is a solution that brings the benefits without having to make most of these tradeoffs.

ccbrianf commented 4 years ago

@jerstlouis "hardcoded terrain elevation" "replaced by a single batched 3D model": Please do not hard code elevation into cultural models in a 2.5 D representation unless there is a special case reason. That is obviously required for a 3D cultural models integrated terrain skin use case though ala OWT. In the 2.5 D case, it is possible via attribution to force absolute model elevation using AHGT today, but we don't want all models rigidly fused with fixed relative elevation for most LODs, unless possibly they are fused in some real world way like sharing a common wall in a city block. I think that's taking cake baking too far for the 2.5 D use cases. I'm fine with a zip file of individual glTF models, or some efficient way of re-projecting individual models in a composite model to terrain conform as you describe later (BTW: smart/efficient data compression is almost always a win versus more IO, but deflate compression is not mandated in CDB today. The zip is just a contiguous container with optional compression). CDB today supports numerous conformal projection attribute possibilities inside of OpenFlight models (although many application use cases don't require that level of sophistication today to produce a reasonable visualization), so extending that to glTF is somewhat reasonable (but would now make handling that absolutely required). Let's also remember that not every type of sensor simulation always uses a GPU, and that batch handling/breaking is really somewhat application specific even on the GPU today, and sometimes also in a constrained edge cases based on the technology in use. What's good batching in FSI VITAL may be moderately different than a CAE Medallion, web browser, android device, or other visualization application.

@kevinbentley, CDB OF today doesn't support sounds, and I don't think we have per texel heat maps today either, but I could have forgotten a late added feature. Heating was based on OF XML zone comments covering a node hierarchy with a certain temperature the last I looked closely. That can affect the optional texel based material codes differently depending on the client sophistication. I think per pixel heat maps would be good to add though if they are indeed missing. The necessary extensions would be much greater for the moving model case than the static model cases, but adding vector attribution in as well also greatly increases the extension need ala OWT attempts to resolve that.

jerstlouis commented 4 years ago

@ccbrianf I was not really talking about a 2.5 D representation (like points coordinates in 2D tiles referencing models would be), but about sometimes (as an option for edge visualization use cases) including the elevation in the vertex data inside a potential geospecific glTF model for the whole tile.

This is getting somewhat closer to the integrated terrain/models skin ala OWT (and might facilitate going back and forth), but preserving attribution and segmentation. My preference would be for buildings to be completely separate data layers from terrain, but if they are sometimes integrated, they would be attributed in such a way that it is easier to split the two or render only one or the other, or extract a DTM heightmap without the buildings from faces identified as belonging only to the terrain.

In the alternative approach I was suggesting (and both could be supported), the batched 3D model is all relative to the terrain, but provides a way to clamp to the terrain (vertex attributes or nodes).

ccbrianf commented 4 years ago

Yes, I think most of what you propose is workable for a certain set of use cases at least. I don't think we can completely do away with the 2.5 D case for data repository or editing, but I understand the value of derivative forms for performance use cases, especially those that attempt to preserve flexibility. They key question is still how many of those do we need and how common can we make them so as to preserve CDB interoperability without reformatting time. I agree with need a 2.5 D case (similar to toay) and a 3D case (similar to OWT), but I'm just now sure how many in between variants of the two are really necessary if we have a good design for those two cases.

jerstlouis commented 4 years ago

@ccbrianf Well I think what I am proposing for geo-specific models is somewhat half-way between CDB 1.x (points tiles & zipped files with many OpenFlight models) and OWT / 3D Tiles (an opaque mesh, as could be directily derived from LiDAR scan or photogrammetry point cloud), but preserving segmentation and structure (e.g. using nodes). So in effect the glTF serves both the performance use case, as well as the repository/editing use case via the organization of the features, which can achieve the same goal as the points tiles / separate models e.g. through nodes (I think glTF even support instantiation of the same mesh at different nodes for use of semi-typical models inside geospecific content).

I think this really comes down to the data production (both manual and automated), and editing tools. It seems that it would make it easier both to bring in data from point clouds as well as export to constrained 3D visualization systems. Whole tiles and their nodes might also be editable out of the box in modeling tools like Blender, while CDB 1.x tools could be adjusted to move nodes transforms instead of points, and converters between the two could be implemented.

And of course the point tiles and referenced models could also still be an option for what was traditionally 1.x geospecific data layers, but maybe in 2.x this could actually work exactly the same as the geotypical use case, where all models of a 'package' encoded in a GeoPackage are all in the same table of glTF models, since all of that is already in a single GeoPackage anyways, rather than introducing an extra level of 'models for this tile' or a zip file container.

BTW I would really orient the glTF axis to be tangent to the Earth surface at the center of the tile (ENU coordinates inside the glTF).

ccbrianf commented 4 years ago

@jerstlouis Bundling of models per tile and bounding the bundle size is very important for IO performance and determinism in our use case. I do not think standard geo-package sqlite random IO cache paging would suffice generally for all models, unless maybe the entire package file size was limited to something fairly small (16-64 GB ish, and probably closer to 16) so that just slurping the whole file into memory becomes a work around. I'm conceptually ok with most everything else you suggest, but I would want the source repository archive/editing camp (like @holly-black and NGA) to weigh in strongly on any composite glTF only direction with respect to production/editing and other use cases.

jerstlouis commented 4 years ago

@ccbrianf as described in the Coverage tiling section (day 3 I think) of the Miro board, the LOD grouping would allow to adjust the size based on how many LODs of a tile pyramid get grouped together in a single GeoPackage, and also allows to customize which data layers get grouped together as well. So this gives a lot of flexibility in terms of the size of each of those GeoPackages, and I would myself picture a size for those tile pyramid packages (including all their model data) to be somewhere between 500 MB - 4 GB.

As mentioned in last post, I don't think the composite glTF need to be the exclusive way to do production/editing, but I would advocate it for the performance use case. And as I was saying I think the more traditional approach to geospecific could become more or less the same as the geotypical approach (a table of small glTF models, and point features referencing them, tiled or not), so that it wouldn't really be an extra burden for developers to support both. Although perhaps the truly GeoTypical models library could be a package on its own.

ryanfranz commented 4 years ago

@kevinbentley @ccbrianf Here is the list of OpenFlight primary nodes that CDB used.

All other primary nodes are not used, and I think that none of the newer OpenFlight nodes were explicitly added to CDB as used. There are also Ancillary and Palette records not listed here. Also, I was always fuzzy on whether the extended material palette was explicitly used or not, but we support it.

ccbrianf commented 4 years ago

@jerstlouis I don't disagree with that convention and geopackage file size range if there is also a tile size limit and some assurance of optimal IO to retrieve each tile bundle with low latency for the modsim use case rather than random access sqlite paging. It is this latter tile size bound and contiguous IO bundle I'm trying to make sure we understand is important for that use case. Individual models in a geo-package table does not satisfy that determinism requirement well, which is why a limited file size zip with optional compression is being used today (by CAE design just to be clear). I'm sure there are other alternatives such as the vector attributed glTF approach, possibly even further compressed in some way, that still achieve these goals.

I would actually prefer for the geo-typical vector point features at least to more resemble the geo-specific where the vector LODs are significant size based and only call out the appropriate model LOD, possibly individual entries in a table or table(s), rather than go the opposite way and make geo-specific look like geo-typical today. In industry CDB 3.1, this approach was called geo-specified (combining the typical and specific point feature referencing in a unified way). I'm not sure why it was discarded in industry CDB 3.2. I do however agree that there must be some way all the LODs of a model are logically tied together as a single entity for editing and exchange purposes. I think it was also industry CDB 3.1 again that had this concept of a master OF file with references to the individual model LODs for geo-specific features similar to what was already available that way for geo-typical ones, again unifying the concepts in a much more usable way. Likewise, I'm not sure why that was also withdrawn (although I don't think it was fully designed in 3.1).

It may make sense for coarse geo-typical model package spatial tiling along regional/continental boundaries as well, but I think you can probably easily fit that into your current proposed scheme. I think I saw a half proposal thought for regionally defined metadata as well that the team discarded as too complex on the miro.

jerstlouis commented 4 years ago

@ccbrianf I agree with the per-tile limit, and I believe that the LOD rules themselves (i.e. content of this tile to be displayed in a screen area taking up x number of pixels) should result in tiled content following those rules, if the generalization is done properly. That could be validated by compliance tests as well (even if those tests need to add the sum of the sizes of the models from the models table referenced by each tile). And I think the tile pyramids grouping X number of LODs together (and corresponding to a limited balanced area / resolution) gives you the contiguous IO bundle aspect.

The draft examples we put up on the Miro board with a grouping of 6 LODs (but for a TMS targeting 256x256 pixels tiles) would have 1365 tiles per GeoPackage. If the tiles were targeting 1024x1024 pixels instead (16x bigger), then a grouping of 4 LODs might be more appropriate and you would have 85 tiles per GeoPackage instead. And one can still decide whether that's all data layers, or only geospecific models, etc.

ccbrianf commented 4 years ago

@jerstlouis I see we have a lot of common understanding and agreement. There is one thing I still don't seem to be getting across though.

Main point: While limiting the sum of model file sizes reference by a particular tile LOD does address some aspects of determinism, for one of the same reasons Mapbox vector tiles uses google protocol buffers to encode the entire vector tile as binary and that is one blob entry in a geo-package table, the modsim use case needs geo-specific models per tile to all be in one container/blob table entry for fast IO. Individual table entries are scattered throughout the file, and even with table indexing to speed up finding them, require an average of one IO per model rather than one per tile for all models bundled together. That is a very significant difference for our fast flight use cases.

Maybe I don't understand how tile pyramids are layed out in a geo-package on disk in the way you propose? I thought that was just how you separated LODS and layers in geo-package files, and thus coarsely bounds both the file size, spatial locality, and LOD scale consistency. I don't see that this has too much of an effect on how data is organized within the geo-package file. That's more about table structure and what the row/column atomic blob unit is.

Small points:

jerstlouis commented 4 years ago

@ccbrianf Well that performance case is partly why I suggest a single glTF for all your geospecific models within a given tile, which in addition of being a single IO, is also ready to load onto a GPU VBO.

For models that are referenced more than once (geotypical models chief among them), my understanding is that you would be re-using the models in multiple tiles of that 85 or 1365 tiles GeoPackage, so many of the models might already be hot in the GPU when you load a new tile. For truly geotypical models, you would likely be re-using the models in tiles spanning multiple geopackages (hence why perhaps a stand-alone geotypical models package might be a good idea).

Also perhaps a single SELECT returning all the models you're going to need for your new tile(s) might achieve more efficient IO... You may also want to preload models of a tile pyramid in advance of using the individual tiles.

As for the layout within the GeoPackage, it is a simple tiles table table (identified by level / row / column) with a data blob. This can work for imagery, coverage, models (e.g. single glTF), or vector data (could be using an MVT encoding, could also be used for referencing GeoTypical models). For 'shared' models, I optionally see a table with an ID and a glTF binary blob, and a stand-alone package for all geotypical models could also exists outside of all packages organized spatially by grouped LODs.

I will need to read more about the concept of significant size :)

Re: 1024x1024, I fear this makes it more difficult to reach the streaming / edge cases if the repository is organized this way. Perhaps we could review the IO performance impact of considering 256 x 256 or 512 x 512 ?

ccbrianf commented 4 years ago

@jerstlouis Yes, a glTF composite of the tile's models could work as a container as I conceded earlier in this discussion (hopefully maybe with optional compression as well) as long as your attribution scheme (which might almost be the same at OWT) allowing re-projection and individual model attribution/detection works out. That is certainly one valid option to replace zip files as a model tile container. It could also be a zip file of individual glTF models, or many other blob formats.

You are correct that we are not as concerned about individual table entries for geo-typical models for some of the reasons you point out, and I think a single separate, or a few spatial geo-package libraries to contain them is a probably a good idea. I would still like to consider separating LODs such that they are individually reference so the appropriate ones can be directly called out by the composite geo-specific/typical "vector" point feature tile (if we decide there is such a thing), but also retain a way to preserve the entire model integrity by having a way to directly find the master model file that calls out all the LODs.

Are you planing for the streaming edge case to be direct network geopackage paging via the native sqlite mechanisms (direct file sharing)? Otherwise, if there is some streaming server handling the paging, I expect using a larger tile for caching on that side than you actually stream out to the edge device would be beneficial. In fact, that's the exact use case which leads to my request for modsim. For a "unconstrained" device, the 1024x1024 sized disk IO is much more efficient. Maybe this too could be up to the usage profile?

jerstlouis commented 4 years ago

@ccbrianf Yes I think glTF supports compression extensions? Or it could be a single compressed glTF. The main issue I see with a zip files with tons of small glTF is the inefficiency of repacking all that tiny separate data to be efficiently rendered. If we can achieve all the attribution/detection goals with a single GPU-ready glTF for the tile, why not?

I fully agree that separate LODs for those geo-typical models make sense (e.g. models table in a geopackage could have an LOD column).

For streaming to the edge case, I was picturing streaming individual tiles (e.g. glTF), as opposed to those packages packing tile pyramids of a number of LODs (e.g. GeoPackage). Bulk downloads could directly retrieve those tile pyramid GeoPackages based on desired LODs & AOI though.

Re: 1024x1024 that would indeed make sense to be configurable on the usage profile, and that is technically supported by the proposed data layers tiling model. However, it requires using a modified TileMatrixSet, as that is where the size is defined. e.g. the CDBGlobalGrid defines 1024x1024, except for negative levels which gradually go down to 1x1, while the GNOSISGlobalGrid which we are considering specifies 256x256. But we could define variants such as GNOSISGlobalGrid512 and GNOSISGlobalGrid1024 with adjusted target size to also be valid TMS selections. The LOD numbers would remain the same, but they now map to a different resolution / scale since they are targeting a different tile size.

ccbrianf commented 4 years ago

I continue to agree if you can achieve all the attribution/detection goals within a single glTF for the tile (talking about the terrain not included case for the moment) I have no currently known ;-) issues with that approach. But I continue to point out that while glTF is fairly efficient for the GPU, it may not be completely optimal for an individual application's usage (even on the GPU), so some repacking may still be necessary anyway.

Streaming: Yes, I understand, but my question was about direct access to those tiles in the GeoPackage file, or rather though OGC Web Service APIs or an equivalent where there is a server (light weight) data transformation involved. I was just pointing out that in the latter case, for raster tiles at least where clipping is near trivial operation, it often makes sense for the server to retrieve larger tiles than it sends to the client, priming the tile cache with fewer, larger IOs, which is usually beneficial.

I'm no expert in the TMS standard or GNOSISGlobalGrid, but I agree that my naive expectation would be for the resolution at a particular LOD number to change based on the tile size rather than any spatial coverage changes. I'm slightly surprised that such a definition wouldn't be conceptually separate from the grid definition, but I guess the CDB negative LOD concept is one of those cases where they are tied together.

jerstlouis commented 4 years ago

Re: streaming, having GeoPackages directly matching what is being distributed by the web API allows to pass things straight through. If you are doing any transformation at all (even simple raster clipping), I am not sure the big IO would really outperform the simplicity of streaming the blobs directly as requested. e.g. serving many 256x256 tiles from GeoPackages storing tiles as 1024x1024 vs. 256x256 as blobs, I would still put my bet on the 256x256 GeoPackage. I might be wrong. Might be interesting to test, and having both options might be useful as well for different use cases. With the variable width tile matrices, it becomes slightly more complex than clipping as tiles can coalesce getting closer to the poles, and you're effectively retrieving data from a lower LOD (with a different partitioning) of the 1024x1024 TMS version of the grid for fulfilling requests via the 256x256 TMS.

@ccbrianf Conceptually, it is separate from the grid definition, but not from the TileMatrixSet definition. Which is why you could have multiple TileMatrixSets with different target sizes for the same grid. The definition of a TileMatrix (i.e. one zoom level of a TileMatrixSet) specifies a TileWidth and TileHeight in target rendering pixels, which together with the grid definition (the extent of the tiles in CRS units) results in the TileMatrix corresponding to a scale & resolution.

ccbrianf commented 4 years ago

Tile size is certainly worth performance testing, especially since I don't have hard data to back up my preferences at present, rather just gut experience and CDB precedent (which I believe was tested and published by CAE at one point). I would never recommend the kind of reformatting you describe as necessary to convert between a 1k tile and a 256. If it can't be defined as simple as just serving 1/4 of the existing data, I'm pretty sure it's not a win. I don't personally understand why it couldn't be setup that way given your high level TMS description, but I'll take your word for it.

Thanks for trying to help educate me on TMS. I need to do my homework now :-).

jerstlouis commented 4 years ago

@ccbrianf In the CDB LOD tiling scheme, the 1/4th of the data worked because everything at level 0 and higher zoom level was a perfect quad tree all throughout. On the negative side, at the poles you had million of tiles for a tiny area, and the negative LODs are very difficult to use because they do not get re-combined into larger tiles.

The GNOSIS Global Grid achieves similar re-balancing as the CDB grid does at the GeoCell (LOD 0) level with the CDB zones, but re-adjusts it at each level (the rule is simple: if a tile touches the pole, the half touching the pole is not split into 2 tiles longitude-wise, so that you always maintain only 4 tiles at the poles). Technically it is not super complex as the coalescing is a simple compression by 2 of the tiles, so one could either average or skip every second pixel of part of the tile which undergoes a coalescing factor change.