Open julietcohen opened 2 years ago
deduplication of files seems to be occurring twice in the workflow - once when we stage the tiles initially, and again in the staged_to_3dtile()
. Is that correct?
There is also a comment further down the workflow about this:
# Deduplicate & make leaf 3D tiles all staged tiles (only highest
# z-level).
# TODO: COMBINE WITH STEP 2, so we only read in and deduplicate
# each staged file once.
Update: After reviewing the README for the pdgstaging
repo, it seems that some files (the polygons that overlap 2+ tiles) are indeed duplicated in the staged
folder, and they are only deduplicated when we execute the function staged_to_3dtile()
.
The centroids of the polygons are assigned to only 1 tile (the centroids are never assigned to multiple tiles, because even if they fall on a tile boundary they are assigned to the SE tile), and this centroid tile assignment may differ from the polygon tile assignment if the polygon falls within 2+ tiles (these tile assignments are 2 separate properties assigned when we execute the staging step). When we execute staged_to_3dtile()
, if the centroid tile assignment does not match the polygon tile assignment, then the polygon is removed from that tile.
I'm confused about a few differences between objects in the section where we define a bunch of variables:
workflow_config = '/home/jcohen/sample_data/ingmar-config__updated.json'
logging_config = '/home/jcohen/sample_data/logging.json'
batch_size_staging = 1
batch_size_rasterization = 30
batch_size_3dtiles = 20 # leaf tiles? higher resolution, more zoomed in, which is why we process fewer of them in a batch relative to the parent tiles
batch_size_parent_3dtiles = 500
batch_size_geotiffs = 200
batch_size_web_tiles = 200
batch_size_3dtiles
the leaf tiles? I assumed cause the other object specifies they are parent 3d tilesall_staged_to_3dtiles()
but kernal crashed at the end of the workdayp3dtiler
package? but where is this in the workflow?) and where are they output? I see that we create the .json files (that I believe are the metadata for the .B3DM files) within the 3dtiles
dir that is created by all_staged_to_3dtiles()
but I do not see the actual .B3DM tiles that I believe are the actual visualizations on PDGEarly on in make_3d_tiles, we create a class StagedTo3DConverter() ... why is this pulling data from the staging folder?
Short answer: Because we create 3D tiles from vector data (e.g. shapesfiles or geopackage files). 3D tiles are just another format of vector data.
Long answer: You can think of the workflow as having three main parts: 1) Staging; 2) Rasterization; 3) 3D tiles. Rasterization and 3D tile creation are like two separate branches of the workflow that both start after the staging step (refer back to these slides). The staging step is where we do all of the tasks that are needed for both the subsequent rasterization and 3D tiles steps.
here's a diagram I made that now I am second-guessing ... I made a new schematic that is hopefully more correct
I tried correcting the first diagram here (please just ignore the part that says "Text" 🙃)
But your second diagram is correct! 🎉
To answer questions from the second diagram:
depending on if we use rasterize_all or rasterize?
rasterize_all
method (which automatically pulls in all the staged data) or the rasterize
method (which rasterizes one file and requires a path to that file).Here's a little bit more about how the lower z-level GeoTiffs are created during the rasterization step:
when do we use package py3dtiles?
StagedTo3DConverter
uses the viz-3dtiles
package, which uses the py3dtiles
package. We could eventually re-organize how this works.
deduplication of files seems to be occurring twice in the workflow - once when we stage the tiles initially, and again in the staged_to_3dtile(). Is that correct?
This depends on what is set for the deduplicate_at
config option. But for our purposes, we currently have it set to deduplicate at the start of 3D tiles AND at the start of Rasterization. This is because we want to archive the GeoPackage tiles (the output of staging) with all their original data, without deduplicated polygons removed. Otherwise, it would make more sense to just deduplicate once during staging. The comment you referenced suggests a third option: when the config is set to deduplicate at both staging and 3D tiles, that we should deduplicate just before passing the data to those two branches of the workflow (but after staging). This would require re-working how the parts of the workflow work together.
Update: After reviewing the README for the pdgstaging repo, it seems that some files (the polygons that overlap 2+ tiles) are indeed duplicated in the staged folder, and they are only deduplicated when we execute the function staged_to_3dtile().
The centroids of the polygons are assigned to only 1 tile (the centroids are never assigned to multiple tiles, because even if they fall on a tile boundary they are assigned to the SE tile), and this centroid tile assignment may differ from the polygon tile assignment if the polygon falls within 2+ tiles (these tile assignments are 2 separate properties assigned when we execute the staging step). When we execute staged_to_3dtile(), if the centroid tile assignment does not match the polygon tile assignment, then the polygon is removed from that tile.
Yes, this is all correct, but it's important to note that this is a different case of deduplication (confusing, I know). In this case, the data is duplicated because our workflow duplicated it during staging. We duplicate it to make sure that the rasterization step has access to polygons that overlap a tile even just a little bit, otherwise there will be weird edge effects in the resulting PNGs. However, we do not want identical polygons in the resulting 3D tiles, so we ALWAYS remove them at this step.
The OTHER deduplication which is configurable in the workflow is related to input files overlapping the same area. For the lakes dataset, where files overlap, the same lakes are detected twice, once in each file. Because the images that lakes are detected from differ a little, the same lakes won't give the exact same polygons, so the deduplication strategy is a little more complex. This part of the pdgstaging docs gives a detailed overview of these deduplication strategies.
what's the difference between 3d tiles and web tiles?
3D tiles are the Cesium 3D Tiles (b3dm & json), web tile are the image tiles we create just for showing in Cesium (PNG)
are the 3dtiles referenced in the object batch_size_3dtiles the leaf tiles?
Yes, the way we are making our 3D Tile tree is such that ONLY the leaf tiles have B3DM content. So we only show the Cesium 3D tiles when a user is very zoomed into the map. Parent tiles, in our case, are all JSON that references their child JSON or B3DM content.
are leaf tiles equivalent to child tiles?
Leaf tiles are children tiles. But we can have child tiles that are not leaf tiles. Page 2 of the Cesium 3D Tiles Reference card is a good reference here.
how are the .B3DM files produced (by the p3dtiler package? but where is this in the workflow?) and where are they output? I see that we create the .json files (that I believe are the metadata for the .B3DM files) within the 3dtiles dir that is created by all_staged_to_3dtiles() but I do not see the actual .B3DM tiles that I believe are the actual visualizations on PDG
The .B3DM tiles are created by the staged_to_3dtile
method, but you will only find them in the high z-level directory (11). Do you see them there?
@robyngit thanks for all that detailed feedback, your drawings and differentiation between the different types of deduplication and child vs leaf tiles are very helpful!
Yes, now I do see the .B3DM tiles that were created by the staged_to_3dtile
method in the highest z-level directory, I had not checked the highest z-level folder last time, which is why I missed them and just saw the .json files. I'm looking in your 3dtile
dir that was created when you ran through this sample data processing.
However, I am still working through producing the 3dtile
dir myself, as I'm struggling to run the staged_to_3dtile
. I'm trying to run only this part of the workflow, since I already staged and rasterized, so I have been trying to piece apart this pdg workflow that is one big script. I thought it would help my understanding to pull out the code for the exact step I am trying to do (creating the 3d tiles) but I'm running into errors that are making me question if it is worth it to parse the script and un-parallelize this step just to process this small sample dataset.
Since I finished the rasterization step, I tried jumping straight to tiles3dmaker.all_staged_to_3dtiles()
but there was an error regarding the BoundingVolumeRegion
not being defined, so went down a rabbit hole of trying to define that using tiles3dmaker.bounding_region_for_tile(path_to_one_highest_z_level_gpkg_file)
, and I also tried manually running parts of the code that makes up that function, but I have not been successful. I will keep at it but feel a little disheartened that taking apart the script into pieces is coming along slower than I imagined.
Each time I try to execute tiles3dmaker.all_staged_to_3dtiles()
, it is helpful that an error is immediately output (rather than waiting for every file to process and only realize there was an error afterwards) but I am forced to interrupt the kernel by killing the process in the terminal since VS Code becomes unresponsive as it tries to process every staged file.
BoundingVolumeRegion not being defined
BoundingVolumeRegion
is a class that is imported in the viz-3dtiles library. It sounds like maybe you need the import that class at some point?
I will keep at it but feel a little disheartened that taking apart the script into pieces is coming along slower than I imagined.
Hopefully is helpful in learning the different parts of the script, but any time you are stuck, I'm happy to jump on a zoom call and help you debug! :)
I was able to resolve that error regarding BoundingVolumeRegion
not being defined by importing both of the following:
import viz_3dtiles
from viz_3dtiles import TreeGenerator, BoundingVolumeRegion
Just importing viz_3dtiles
didn't seem to do the trick, but that the second import line was found in StagedTo3DConverter.py. Thank you for the help, Robyn!
Thank you for offering to zoom to debug. I might take you up on that offer later today.
Since I am not working with batches, and am instead processing all files in one batch:
for z in parent_zs:
# Determine which tiles we need to make for the next z-level based on the
# path names of the files just created
child_paths = tile_manager.get_filenames_from_dir('geotiff', z=z + 1)
parent_tiles = set()
for child_path in child_paths:
parent_tile = tile_manager.get_parent_tile(child_path)
parent_tiles.add(parent_tile)
parent_tiles = list(parent_tiles)
# Robyn explained here that I do not run the following function in a loop, that's just if we were iterating over many batches
create_composite_geotiffs(tiles = parent_tiles, config = workflow_config, logging_dict = logging_dict)
I am not sure if that actually created any new files. It seems that the geotiff
dir is the same as it was before running that loop. But from what I can tell, my geotiff
dir looks like Robyn's geotiff
dir when she processed this data, so moving on.
rasterizer.update_ranges() # not sure what this does
# create a file path for every .tiff within the `geotiff` dir, resulting in 7768 paths
geotiff_paths = tile_manager.get_filenames_from_dir('geotiff')
# create function for creating web tiles
def create_web_tiles(geotiff_paths, config, logging_dict=None):
"""
Create a batch of webtiles from geotiffs (step 4)
"""
import pdgraster
if logging_dict:
import logging.config
logging.config.dictConfig(logging_dict)
rasterizer = pdgraster.RasterTiler(config)
return rasterizer.webtiles_from_geotiffs(
geotiff_paths, update_ranges=False)
create_web_tiles(geotiff_paths, workflow_config, logging_dict)
This code ran fine. But similarly to the last step, it does not seem that any new files were created by this code. But it matches Robyn's web_tiles
dir from what i can tell, so moving on.
It also produced a concerning output (even though it did not error). This output was repeated many times, specifically 373 considering the length of the workflow.ipynb
output:
/home/jcohen/anaconda3/envs/pdgviz/lib/python3.10/site-packages/pdgraster/WebImage.py:110: RuntimeWarning: divide by zero encountered in double_scalars
(255 / (max_val - min_val))
This might be the source of the reason I am not able to create a 3dtiles
dir in the following steps. I need to look into the pdgraster
library to determine this. I think it would be helpful to figure out what max_val
and min_val
are. I looked through the pdgraster
repo but did not find the answer.
staged_paths = stager.tiles.get_filenames_from_dir('staged')
# define the function
def create_leaf_3dtiles(staged_paths, config, logging_dict=None):
"""
Create a batch of leaf 3d tiles from staged vector tiles
"""
#from pdg_workflow import StagedTo3DConverter
if logging_dict:
import logging.config
logging.config.dictConfig(logging_dict)
converter3d = StagedTo3DConverter(config)
tilesets = []
for path in staged_paths:
ces_tile, ces_tileset = converter3d.staged_to_3dtile(path) # tiles3dmaker = converter3d if converter3d = StagedTo3DConverter(workflow_config)
tilesets.append(ces_tileset)
return tilesets
# apply the function
create_leaf_3dtiles(staged_paths = staged_paths, config = workflow_config, logging_dict = logging_dict)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/jcohen/lake_change_sample/workflow.ipynb Cell 28 in <cell line: 1>()
----> [1](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) create_leaf_3dtiles(staged_paths = staged_paths, config = workflow_config, logging_dict = logging_dict)
/home/jcohen/lake_change_sample/workflow.ipynb Cell 28 in create_leaf_3dtiles(staged_paths, config, logging_dict)
[10](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9) tilesets = []
[11](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=10) for path in staged_paths:
---> [12](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11) ces_tile, ces_tileset = converter3d.staged_to_3dtile(path) # tiles3dmaker = converter3d if converter3d = StagedTo3DConverter(workflow_config)
[13](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=12) tilesets.append(ces_tileset)
[14](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=13) return tilesets
TypeError: cannot unpack non-iterable NoneType object
ces_tile
is the cesium3d tile, and ces_tileset
is the json tileset product. I am not sure what's wrong with the code here. I thought the error might originate from the syntax, since I am not working with batches. I played with this code for a long time but ended up running what i believed is equivalent:
tiles3dmaker.all_staged_to_3dtiles()
While that function ran fine, it did not create a dir called 3dtiles
like I expected. I also created a folder 3dtiles
then re-ran that to see if it would populate the folder if it was already present, but it did not.
Moving on without this 3dtiles
folder does not make sense, but I set up the following step to execute when I figure out how to create that folder.
max_z_tiles = [tile_manager.tile_from_path(path) for path in staged_paths]
# get the total bounds for all the tiles
max_z_bounds = [tile_manager.get_bounding_box(tile) for tile in max_z_tiles]
# get the total bounds for all the tiles
polygons = [box(bounds['left'],
bounds['bottom'],
bounds['right'],
bounds['top']) for bounds in max_z_bounds]
max_z_bounds = gpd.GeoSeries(polygons, crs=tile_manager.tms.crs)
bound_volume_limit = max_z_bounds.total_bounds
# loop that reads from the 3dtiles folder that should have been created in the previous step:
for z in parent_zs:
# Determine which tiles we need to make for the next z-level based on the
# path names of the files just created
all_child_paths = tiles3dmaker.tiles.get_filenames_from_dir('3dtiles', z=z + 1)
parent_tiles = set()
for child_path in all_child_paths:
parent_tile = tile_manager.get_parent_tile(child_path)
parent_tiles.add(parent_tile)
parent_tiles = list(parent_tiles)
# define function
def create_parent_3dtiles(tiles, config, limit_bv_to=None, logging_dict=None):
"""
Create a batch of cesium 3d tileset parent files that point to child
tilesets
"""
#from pdg_workflow import StagedTo3DConverter
if logging_dict:
import logging.config
logging.config.dictConfig(logging_dict)
converter3d = StagedTo3DConverter(config)
return converter3d.parent_3dtiles_from_children(tiles, limit_bv_to)
# apply function
create_parent_3dtiles(parent_tiles, workflow_config, bound_volume_limit, logging_dict)
Output because there was no 3dstaging dir to read from: []
I was using an older version of the config file that was linked in the issue I'm following, but now I am trying the workflow with the updated config file found in Robyn's workflow. This was one of those realizations that can only happen after stepping away from the code for a night and returning in the morning.
rasterizer.update_ranges() # not sure what this does
... I think it would be helpful to figure out what max_val and min_val are.
This is a really important step that does the following: 1) Opens up the raster summary CSV file 2) Calculates the min and max pixel value for across the entire z-level for each GeoTiff band 3) Updates the min and max value for each z-level in the config
We use the min and max pixel value for each z-level to map the entire range of values to the color palette when creating the PNG web tiles. If we were only to use the min and max within a tile, then the colors would not be mapped evenly across the layer. This could have something to do with the error you're seeing in web tile generation: divide by zero encountered in double_scalars (255 / (max_val - min_val))
- the min_val
and max_val
here are the min and max for the z-level. It sounds like you have the same min and max, giving '0', which is not expected.
If you ever want to know what one of the methods does, search for the method in the repo, or run: help(rasterizer.update_ranges)
. I did my best to document every method!
This might be the source of the reason I am not able to create a 3dtiles dir in the following steps.
Creating the 3D tiles is independent from creating the GeoTIFFs & webtiles. You could, for example, run the staging step and then run the 3D tiles steps, and skip rasterization all together. So the issues with 3D tiles is unrelated to the raster code.
cannot unpack non-iterable NoneType object
This sounds like staged_paths
is None when you're passing it to create_leaf_3dtiles
maybe?
the updated config file found in Robyn's workflow
Could you share the config you're using?
Great, thanks Robyn!
That makes sense that we want the min and max pixel values for each z-level to map the range of values with a color palette when creating the PNG web tiles. I agree that subtracting those values should not yield 0. I'm currently running the rasterization step again (after re-staging the files too). Hopefully re-generating the geotiff
dir with the new config file will resolve that error.
Ah yes, good point that the staging
dir is fed into the 3d tiles step, not the rasters, so the geotiff
error is not affecting my ability to create 3d tiles.
I did check the object staged_paths
, and it was indeed a list of all the staged files. I'll check this again today when I try that step after regenerating the tiles.
Here is the updated config I'm using (config__updated.json
):
{
"version": null,
"dir_geotiff": "/home/jcohen/lake_change_sample/geotiff",
"dir_web_tiles": "/home/jcohen/lake_change_sample/web_tiles",
"dir_3dtiles": "/home/jcohen/lake_change_sample/3dtiles",
"dir_staged": "/home/jcohen/lake_change_sample/staged",
"dir_input": "/home/jcohen/lake_change_sample/input",
"dir_footprints": "/home/jcohen/lake_change_sample/footprints",
"filename_staging_summary": "/home/jcohen/lake_change_sample/staging_summary.csv",
"filename_rasterization_events": "/home/jcohen/lake_change_sample/rasterization_events.csv",
"filename_rasters_summary": "/home/jcohen/lake_change_sample/rasters_summary.csv",
"filename_config": "/home/jcohen/lake_change_sample/config.json",
"ext_web_tiles": ".png",
"ext_input": ".shp",
"ext_staged": ".gpkg",
"ext_footprints": ".gpkg",
"prop_centroid_x": "staging_centroid_x",
"prop_centroid_y": "staging_centroid_y",
"prop_area": "staging_area",
"prop_tile": "staging_tile",
"prop_centroid_tile": "staging_centroid_tile",
"prop_filename": "staging_filename",
"prop_identifier": "staging_identifier",
"prop_centroid_within_tile": "staging_centroid_within_tile",
"input_crs": null,
"simplify_tolerance": 0.0001,
"tms_id": "WorldCRS84Quad",
"tile_path_structure": [
"style",
"tms",
"z",
"x",
"y"
],
"z_range": [
0,
11
],
"tile_size": [
256,
256
],
"statistics": [
{
"name": "polygon_count",
"weight_by": "count",
"property": "centroids_per_pixel",
"aggregation_method": "sum",
"resampling_method": "sum",
"val_range": [
0,
null
],
"nodata_val": 0,
"nodata_color": "#ffffff00",
"palette": "#d93fce",
"z_config": {
"0": {
"val_range": [
null,
4533.000000001244
]
},
"1": {
"val_range": [
null,
1520.9999999982012
]
},
"2": {
"val_range": [
null,
533.0000000026628
]
},
"3": {
"val_range": [
null,
143.0000000016093
]
},
"4": {
"val_range": [
null,
45.99999999881213
]
},
"5": {
"val_range": [
null,
17.99999999962709
]
},
"6": {
"val_range": [
null,
6.999999999865963
]
},
"7": {
"val_range": [
null,
3.9999999996430233
]
},
"8": {
"val_range": [
null,
1.9999999998981368
]
},
"9": {
"val_range": [
null,
2.0
]
},
"10": {
"val_range": [
null,
2.0
]
},
"11": {
"val_range": [
null,
1.0
]
}
}
},
{
"name": "coverage",
"weight_by": "area",
"property": "area_per_pixel_area",
"aggregation_method": "sum",
"resampling_method": "average",
"val_range": [
0,
1
],
"nodata_val": 0,
"nodata_color": "#ffffff00",
"palette": "#d93fce"
}
],
"geometricError": null,
"z_coord": 0,
"deduplicate_at": [
"raster", "3dtiles"
],
"deduplicate_method": "neighbor",
"deduplicate_keep_rules": [
[
"staging_filename",
"larger"
]
],
"deduplicate_overlap_tolerance": 0.1,
"deduplicate_overlap_both": false,
"deduplicate_centroid_tolerance": null,
"deduplicate_distance_crs": "EPSG:3857",
"deduplicate_clip_to_footprint": false,
"deduplicate_clip_method": "within"
}
The only changes I made to this were the palette in 2 places and file paths at the top.
Thanks for sharing the config @julietcohen! This version, with the suffix __updated
in the filename, is actually created by the workflow. It includes the min and max pixel values for each z-level that were calculated during the run. It's better to start without those in case you are not using the exact same data.
In the pdg-portal
issue, I linked to the config file that I used: ingmar-config.json.txt
The palette that you are configuring is also not valid, see the doc string in ConfigManager. Palette needs to be either the name of a color palette available in the Colormaps library or a list of color strings in any format accepted by the coloraide library. Maybe try using two colors for your palette, for example ["#d9c43f", "#d93fce"]
I did use that config file you linked, ingmar-config.json.txt
, when I ran this workflow the first time, so I was using the right one. I did not realize that a new __updated.json
file was created by the workflow, but I see why.
So I should use ingmar-config.json
for the staging and rasterization, then switch to the generated workflow_config = ingmar-config__updated.json
for creating the 3dtiles like you did in make-3d-tiles.py
.
I did try a few different palettes with 2 colors before I settled on the single color in my config file because it did not error. My syntax must have been wrong, because using 2 values produced an error, but that certainly is correct according to the documentation you linked. I'll choose a valid palette and re-run the staging and rasterization steps with the original config file.
Thanks for the help, as always!
then switch to the generated workflow_config = ingmar-config__updated.json for creating the 3dtiles like you did in make-3d-tiles.py
You shouldn't need to switch to the __updated.json
config at any time really. If for some reason you had to re-generate the PNG web tiles, then using the __updated
config would be faster because you would not need to run rasterizer.update_ranges()
which takes some time. For the 3D tiles, you should also use the same ingmar-config.json
file.
I did try a few different palettes with 2 colors before I settled on the single color in my config file because it did not error.
It's interesting you didn't get an error! I would expect that you'd at least have to put your color in a list, but I haven't tested this 🤷🏻
Ohh got it. I see that by running rasterize_all()
, we are updating the ranges when creating the webtiles because that wraps around both rasterize_vectors()
and webtiles_from_all_geotiffs()
, and the latter has the default update_ranges=True
. I'm relieved to make the connection between the config file and the the ranges. Things are really coming together a lot more with the file PermafrostDiscoveryGateway/viz-raster/pdgraster/RasterTiler.py
that you pointed me to 🤯
Perhaps it would be helpful to add in an error message if the user inputs only 1 color for the palette in the config like I had. I'll note that in the issue I made for the palette in case we want to implement it.
I ended yesterday trying to produce the 3d tiles dir by running the staging, rasterization, and 3d tiles steps from start to finish using ingmar-config.json
, and with batching in the 3d tile step as Robyn did in her script make-3d-tiles.py
, even though I did not suspect that batching would make a difference in that step. The result was no errors, but also no 3d tiles dir produced. It doesn't seem like the function to create 3d tiles finished after I left it running at the end of the day, maybe the server connection was lost and cancelled the process because when I returned to the script in the morning there was no checkmark and runtime stamp for the chunk, and the server connection was gone.
This morning I noticed that the config file ingmar-config.json that I am supposed to use differs in a few ways from the object defined as config in Robyn's script make-tiles.py, where she executes the staging step. The differences are:
config |
ingmar-config.json |
---|---|
uses parenthesis 'z_range': (0, 11) |
uses brackets "z_range": [0, 11] |
specifies tile size 'tile_size': (256, 256) |
doesn't specify tile size |
val_range is from 0 through None 'val_range': [0, None] |
val_range is from 0 through null "val_range": [0, null] |
deduplicates at raster step 'deduplicate_at': ['raster'] |
deduplicates at staging step "deduplicate_at": ["staging"] |
capitalized F 'deduplicate_overlap_both': False |
uncapitalized f "deduplicate_overlap_both": false |
None instead of null 'deduplicate_centroid_tolerance': None |
null instead of none "deduplicate_centroid_tolerance": null |
I am wondering if the 'deduplicate_at': ['raster']
versus ["staging"]
is especially important here. I also checked out the __updated.json
config file that was produced and saved in Robyn's folder for her run through of this data sample. That file has "deduplicate_at": ["raster", "3dtiles"]
, which is confusing because this__updated.json
file would have changed the options here, meaning that the original config file that was the template for this __updated.json
file would have also been set to "deduplicate_at": ["raster", "3dtiles"]
. I suppose that the presence of a __updated.json
file does not imply that the original file it is based on was actually used for the config, since she defined the config object within the script.
The differences that you are seeing between the config object and the config json file are just differences in the syntax for python (config
object) vs JSON (ingmar-config.json
file).
The following are how you translate between python and JSON:
python | JSON |
---|---|
None |
null |
() or [] |
[] |
False |
false |
True |
true |
If you read the JSON file into python, it will be parsed as a dict object, and all of these translations will be done for you.
import json
with open('path_to_file/ingmar-config.json', 'r') as f:
config = json.load(f)
print(config)
For the tile size that I specified in one and not the other: 256x256 is the default, so it makes no difference here. There was no reason to specify it in the config
object.
You can see the default config options like this:
from pdgstaging import ConfigManager
print(ConfigManager.defaults)
Awesome, thanks for clarifying!
Early on in
make_3d_tiles
, we create a classStagedTo3DConverter()
with several functions defined within it (I believe these are called methods) and one of those methods isstaged_to_3dtile(self, path)
. The steps within this make sense to me (such as checking if a tile has data within it, deduplicating, checking if the polygon centroid fall within the tile, etc.). But why is this pulling data from the staging folder? I assumed that we once we rasterized the staged tiles and created thegeotiff
andweb_tiles
folders, we would be pulling from those moving forward. To give you a better understanding of my interpretation of the workflow, here's a diagram I made that now I am second-guessing:Update: I made a new schematic that is hopefully more correct