sofwerx / cdb2-concept

CDB modernization
0 stars 1 forks source link

3D models (GS) - splitting in tiles #34

Open PresagisHermann opened 3 years ago

PresagisHermann commented 3 years ago

The 3D group is looking into repackaging the 3D models (GS models) in an easier way to access them. This leads to a number of exchanges that took place over e-mail that will now continue over GitHUB issues. This discussion relates to tiling.

As we aim at packaging (GS) 3D models together in an easier to access package (mostly by grouping LODs of model together), we face the challenge of splitting ground of 3D into manageable chunks. CDB 1.X today does this via Tiles and LODs. By wanting to group all LODs of models together, we break this scheme. If we do not tile at all, we have the equivalent of the GT model today which would grown exponentially and will not be geospatially divided.

So, following exchange with the tiling group, the 3D model group proposes to pick one level of tile as defined by the tiling group (ref) that is roughly CDB 1.x LOD 5 or CDB X LOD 12 (but can be selected at publication) in order to get a grid grouping the models for a region. In other words, all 3D models with texture and all its LODs would be tiled at a given LOD. Point features at any LOD would point to the models in the respective tile where the model is stored and select the LOD via the MLOD attribute. Tiling the CDB storage this way has several benefits in my view:

  1. Each tile stores models from a given geolocation - easy to find, easy to replace
  2. Models names (to avoid collision) have to be unique inside the tile only - limits the need to inventory a large CDB when creating a new model name.
  3. Manage 3D model "chunks" to something manageable

I will open other issues to discuss:

One feedback from @ryanfranz and @jerstlouis is that storing the 3D models in geopackage along with the point features would be desirable. If we do this, then models have to be in the same geopackage as the points which may limit the packaging flexibility. But @jerstlouis mentionned the LOD grouping solution which might apply here.

jerstlouis commented 3 years ago

@PresagisHermann Picking a specific LOD, rather than a balanced number of LODs to group, in my opinion doesn't scale to address the same problems at a different scales.

While that particular LOD may work for one particular scale, you would run into problem again at a coarser or finer scale.

Picking an appropriate number of LODs to group (starting the grouping from the most detailed) as laid out in the Tiling approach balances things at any scale.

PresagisHermann commented 3 years ago

Interesting. So, does this mean that during production of CDB, you create all LODs independently and then perform an analysis on the density/size and then re-group the LODs? You then "merge" files (Geopakage or others) based on that grouping? This happens at the file level (a file can contain multiple datasets or only 1) meaning possibly different grouping per datasets?

ryanfranz commented 3 years ago

The proposed grouping scheme was based on how many levels of detail we want to merge (6 was thought to be optimal at first glance) and on the maximum level of detail of the data. So to tile 1 meter imagery at LOD 15, and selecting a grouping of 6 LODs in a GeoPackage, you get the following groups of LODs within a single GeoPackage (starting at the highest resolution):

This scheme works well to limit the number of GeoPackage files created, if the data is relatively uniform. The drawbacks are that it makes it harder to add high resolution insets later on and it is likely less efficient for datasets that are less uniform, like 3D models which can be very dense or very sparse. I'm starting to wonder if the groupings should be arbitrary, and the top level "metadata" or json files describes all the LOD groupings rather than rely on a simple function.

We need to keep some real-world numbers in mind if we choose a single LOD to place models at in the CDB. For example, in the dataset of New York City building footprints that we have, there is a single CDB 1.x tile at LOD 3 with more than 240k buildings within the 1/8 x 1/8 degree area that stretches the current CDB limits. And that is probably not the worst case, since cities like Rio de Janeiro or Mexico City are likely denser and less organized.

jerstlouis commented 3 years ago

@ryanfranz I initially suggested a grouping of 6 LOD because that's what we used in our GNOSIS data store and generally produce generally reasonable file size vs. file count (assuming tiles targeting 256 x 256), but the intent was to have that user-configurable, especially configurable based on the data type (e.g. imagery, elevation, 3D models, vector data, or mixed data types).

I suspect different number of LODs for grouping might work better for each of these. Raster data files would generally be much larger than vector GeoPackages (attributes aside). But if all features and their attributes are preserved all the way down to level 0, then grouping more levels saves space as you're just duplicating that attributes data for all groups, and also why I suggested yesterday to only have the attributes only at the level 0 grouping (since they are needed there anyways).

I'm starting to wonder if the groupings should be arbitrary, and the top level "metadata" or json files describes all the LOD groupings rather than rely on a simple function.

What exactly do you mean by that? The simple function was intended to take the per-package groupLOD from the cdb.json as a parameter, not assume a fixed grouping of e.g. 6. Is this what you mean, or you mean even within a single package (of 1 or more data layers) having a variable groupLOD?

We need to keep some real-world numbers in mind if we choose a single LOD to place models at in the CDB.

Again I think a single LOD only works if you expect to show models within a specific scale range (e.g. of ~6 LODs). If there is a use case to get into super-detailed models for which you can have very-high detailed schematics for, and also seeing models as detailed as you can from the air (e.g. from 2:1 to 1:100,000), you will get either too many or too large model packs.

@PresagisHermann

So, does this mean that during production of CDB, you create all LODs independently and then perform an analysis on the density/size and then re-group the LODs?

If you manually create separate LODs, you would create your models and their LODs for each model. The density/size of models I think is something that would be constant based on the amount of details normally found in CDB models. A recommendation could be made for the number of LODs by which to group models, which could be overridden if a particular CDB does something very differently. If the lower LODs are automatically generated, then you only need to create the most detailed LOD of each model and lower LODs are automatically generated.

You then "merge" files (Geopakage or others) based on that grouping?

Regardless of how each LOD of the model are created, all models (and their LODs if they have more than 1) are found in the specific GeoPackage where the grouping function says it should be, based on the LOD grouping parameter. The grouping function takes as parameter the package name, the tile level, row and column, and returns the full path to the GeoPackage where that particular tile is found.

This happens at the file level (a file can contain multiple datasets or only 1) meaning possibly different grouping per datasets?

The cdb.json file sets up packages. A package with a zero groupLOD will be a single GeoPackage, while a package with a non-zero groupLOD will be split into multiple GeoPackages combining multiple tiles for groupLOD LODs (all tiles within the extent of the tile of the lowest LOD of the grouping). A package (whether it is a single GeoPackage or a directory structure of tile groupings) can contain either a single data layer (e.g. models), or combine multiple data layers (e.g. models, imagery, and vector data).

PresagisHermann commented 3 years ago

@jerstlouis The typical model publication process to CDB is that you have existing 3D models with all LODs representations (geometry and texture). The production technique for the 3D models (Point cloud reconstruction to manual to procedural) has not impact as long as multiple LODs of geometry and textures are produced. The publication to CDB is all about placing the point feature (vector) and the geometry, texture, material data files in the correct LOD. Many criteria applies in this logic in order to preserve distribute density and allow fluid consumption of the data. The model geometric size, the significant size of the LOD details as well as the density (models per km2). You fill the CDB LODs in each datasets (point, geometry, texture) as you navigate the source models. Believe me, it is not uniform! Each dataset ends-up with different density on different regions based on feature present. Urban, vs airport vs suburbs s country side etc... Look at existing CDB to confirm.

The challenge is that if we have grouping per datasets, it means we have to use different geopackage per datasets. Second, you cannot predict geopackage size (for 3D models at least) unless you do a complete inventory of the 3D models and test tiling impact. This is not practical on large CDBs and doe snot allow parallel processing during publication.

My point is that we are adding a complexity to the existing CDB publication of 3D models which is already complex. Today to publish a CDB 3D models compliant dataset, you have to "migrate" models LODs to finer LOD when a LOD is full. With the LOD grouping, you are adding an extra step of grouping based on file size.

@ryanfranz based on your experience publishing 3D models in CDB, do you agree on this challenge that LOD grouping brings or you believe it is not a significant issue? CDB 1.X had similar challenge with ZIP size but it was limited per dataset, not grouped datasets.

jerstlouis commented 3 years ago

@PresagisHermann What I am suggesting is that although the LOD grouping is configurable, there should be a recommended grouping for each data type, based on an initial analysis of file sizes for models and limits specified by the standard for determinism.

The standard could go further and require a particular grouping based on data type if wanted, and it could also pick specific LODs at which these LOD groupings begin if we want to facilitate merging distinct datasets.

The publishing tool would not have an extra step after the fact to figure out how to group things. The grouping recommendation would be based on typical model file sizes in experiments within this project and/or its continuity in the SWG.

Again the need for the grouping is because A) We cannot put all models for the whole world in a single zip file or GeoPackage B) Having one zip file or GeoPackage for all the models of one tile at a specific LOD cannot handle properly:

  1. Model LODs much more detailed than that chosen LOD (that model package file is too large and contains too many models)
  2. Displaying models much more coarser than that chosen LOD (too many files to access)

The purpose of the grouping is to establish the reasonable balance between A & B, so that e.g. you could have LODs 10..15 in one set of geopackages, LODs 16..21 in another set of GeoPackages.

ryanfranz commented 3 years ago

@PresagisHermann I'm trying to wrap my head around different proposals here. One thing that CDB 1.x does well is to place models of a similar size at the same LOD, assuming that a client needs to use them roughly at the same time. Putting all the models at a single LOD invites some complications on client behavior. A client either searches through lots of models (and thus lots of tables or zip files) in the distance to try and find the ones large enough to "see" from the eye point (visual or sensor), or the possibility of having a huge number of models if the area of the chosen model LOD contains too many features. The GeoPackage LOD grouping could make it where there is one table of models in a GeoPackage, or 4k tables of models.

For example, if models are placed at CDB X LOD 12, and the coverage grouping uses LODs 12-17 then there is a single LOD 12 tile in this GeoPackage, but if the grouping was 7-12 then there are 4k of LOD 12 tiles in a single GeoPackage.

The other option here is that we have discussed making different data layers use different LOD groupings and use a different directory structure. So have models use their own LOD grouping and data layer. But that makes it more complicated to find equivalent areas of the world to pull out for a user on the edge, since the packaging/grouping is different for each data layer in the CDB. I don't have a good feel for the tradeoffs in this choice.

I wonder how bad it would be to have a table with all the models for a large area, sorted by GUIDs, maybe outside the tiling structure completely, and have the tiled point vectors refer back to the models in this table? That might be a huge table, but a client could keep it open and find the models being referred to in the point vectors. I honestly don't know how that would perform, and it is likely make CDB versions much harder to handle.

jerstlouis commented 3 years ago

@ryanfranz

The other option here is that we have discussed making different data layers use different LOD groupings and use a different directory structure. So have models use their own LOD grouping and data layer. But that makes it more complicated to find equivalent areas of the world to pull out for a user on the edge, since the packaging/grouping is different for each data layer in the CDB. I don't have a good feel for the tradeoffs in this choice.

My idea behind the LOD grouping was always to make this adjustable. First to handle the "no grouping" case where you want a simple single GeoPackage with everything inside, then second precisely because different data layer types take up a very different amount of space per tile (e.g. vector vs. coverages vs. models).

In terms of complexity of implementation, it really does not add any complexity because that LOD grouping variable (e.g. 6 or 7) is used in the simple function that determines the package filenames, so it's just a matter of this being a constant or a variable.

The single table with ALL models for NGA's entire world use case is not really feasible, is it? I am guessing this exceeds even the extremely large 281 terabytes max size for SQLite databases.

ryanfranz commented 3 years ago

@jerstlouis This is the complication I am referring to:

If your imagery uses a grouping of 6 LODs, and your 3D models uses a grouping of 7 LODs, and elevation uses a grouping of 8 LODs, we end up with three layers (sets of directories) using the same tiling scheme but each has a different set of folders and file names. Exporting an area of the world is more complicated than grabbing a single directory, and takes a more complicated process to create the export.

The flexibility to change the LOD grouping variable is what introduces complexity, and the need to balance tradeoffs.

Yes, a single table with every building in the world is not feasible (especially with the amount of content that imagery providers are creating these days), but I would guess that most of the NGA's holdings are imagery and a large portion of the models are generic and thus reused throughout the CDB. But I wanted to try and think through the possibility of larger tables of models in exchange for not having to open so many GeoPackage files.

jerstlouis commented 3 years ago

Re: the different LOD grouping per data type, the configurability gives you the option to choose which way to go when producing the data or writing producing software. I would argue it's not super complicated, and largely worth the trade-off :) From the visualization tool's perspective, it's trivial to support all those possibilities based on that simple cdb.json.

@ryanfranz in my mind generic models falls under the topic of geo-typical, not geo-specific, where a single large table totally makes sense. Geo-specific approaches should be reserved for models whose geometry & textures originate from e.g. photogrammetry or LiDAR or BIM models or modeling specifically that one model.

cnreediii commented 3 years ago

All. Sorry to be dense, but could someone define "model" for me? The term is used throughout the discussion but I am not sure if we are referring to content such as a terrain model or true 3D models of something like a generic tree or a tank. If the latter, perhaps looking at what current commercial and gaming products do for LoDs would make sense rather than re-inventing some other approach.

PresagisHermann commented 3 years ago

@cnreediii we are talking about 3D models on terrain as defined in CDB today. The terrain "skin" (elevation and imagery) remain in their datasets as in CDB 1.X. So, models refer to buildings and all man made objects as well as natural (vegetation etc...) We are not debating the CDB 1.X LOD concept (significant size, polygon count etc...) but rather looking at the packaging of the models in tiles to facilitate consumption. I am not aware of standardized LOD for 3D object geometry in gaming; I believe it is based on various modeling convention that are engine and gameplay defendant. Maybe @vwhisker could provide more info on gaming LODs. I know CityEngine has a definition of LoD for buildings but it is not based on rendering performance as CDB defines it.

cnreediii commented 3 years ago

@PresagisHermann Thanks! Got it. Perhaps we should ask the CityGML community if they have any ideas about how they would approach this issue? I am happy to ask if that makes sense.

vwhisker commented 3 years ago

Unity3D has a couple of ways to handle LODs. You can construct LODs from a mesh in the editor, or you can load models/meshes with the _LOD tag. Numbering is "backwards" LOD 0 is the most detailed, LOD N is the least detailed.

https://docs.unity3d.com/Manual/LevelOfDetail.html

A LOD group node on an object controls the visibility of the LODs based on camera distance (which can be biased).

Models/meshes can also be imported with LODs: https://docs.unity3d.com/Manual/importing-lod-meshes.html Name the meshes according to the following naming convention: ExampleMeshName_LOD0 for the first LOD level (i.e., the most detailed version) ExampleMeshName_LOD1 ExampleMeshName_LOD2

My team hasn't had to make use of these features to a great extent. We have loaded features based on a spatialized R-Tree and user position. We've been considering swapping meshes for icons using LODs to improve long distance visibility for some of our applications, but have yet to implement that. When we work with OpenFlight, we typically rip out the LODs and work with the highest LOD mesh and let the game engine handle the rest. Not sophisticated, but it's worked so far.

PresagisHermann commented 3 years ago

@vwhisker the Unity LOD scheme is similar to what existed for many years when doing OpenFlight terrain (meshed terrain with referenced to 3D models). Terrain mesh would have LODs and models would have them as well. The CDB design of LOD is much more descriptive and attempt to standardize LODs in a way that is a clear contract between content producers (modelers and CDB builders) and consumer (applications loading the CDB, streaming as efficiently as possible).

LODs become essential when large terrain are used and you need to preserve wide field of view (map applications, flight applications are the common ones).

I believe that for CDB X, we want to leverage all the good standardization of LODs brought by CDB 1.X. That said, we agreed to revisit some of the LOD tables to account for technology advancement in the 10-15 years since the first spec.

The discussion here focus on grouping models into tiles in a way that facilitate consumption while preserving edition capabilities. Where we seem to agree that grouping is beneficial, the concern is now if we have different grouping on different datasets. This leads to less intuitive editing, on disk access to part of the CDB and to some slight complexity on runtime consumption as per the discussion above.

cnreediii commented 3 years ago

@PresagisHermann Agreed. Will be interesting to see what the final results of the experimentation in the various sub-groups tells us WRT the LoD approach for CDB-X.

vwhisker commented 3 years ago

@PresagisHermann One of my guys found this recently. Sounds like there are some alternatives to the LOD system I posted about above that may support designs like CDB uses with separate, discrete/defined LODs. https://blogs.unity3d.com/2018/01/12/unity-labs-autolod-experimenting-with-automatic-performance-improvements/

PresagisHermann commented 3 years ago

@vwhisker Interesting. this would need to be investigated further but I see some differences. There seems to be two concepts here: the HLOD and the runtime LOD generation. CDB decided to store offline generated LODs for paging (loading the data from disk) performance. Most large coverage format do that else, you need to load a lot of super high res data to generate many LOD which is challenging in I/O and memory. I believe the scene they test here is fairly simple compared to some high resolution CDBs and the use case of fast flying.

If we were to implement this in CDB, if I understand correctly, we would essentially not create any LOD and tell every client application developer to implement such a system to create them at load time.

Maybe the answer is a profile! Have a profile that stores only highest resolution data in the CDB and le the client runtime produce LODs.

As for Auto-LOD generation, note that they do point out that some model benefit from overriding the auto-LOD generation as the result is not great. We found the same at Presagis where we do auto-LOD and sometimes we have to manually modify/create them.