julietcohen / lake_change_sample

Learning PDG workflow with parsl by processing sample data for lake change in the arctic
0 stars 0 forks source link

Questions about parsl workflow: making 3D tiles #2

Open julietcohen opened 2 years ago

julietcohen commented 2 years ago

Early on in make_3d_tiles, we create a class StagedTo3DConverter() with several functions defined within it (I believe these are called methods) and one of those methods is staged_to_3dtile(self, path) . The steps within this make sense to me (such as checking if a tile has data within it, deduplicating, checking if the polygon centroid fall within the tile, etc.). But why is this pulling data from the staging folder? I assumed that we once we rasterized the staged tiles and created the geotiff and web_tiles folders, we would be pulling from those moving forward. To give you a better understanding of my interpretation of the workflow, here's a diagram I made that now I am second-guessing: IMG_0299

Update: I made a new schematic that is hopefully more correct IMG_0308

julietcohen commented 2 years ago

deduplication of files seems to be occurring twice in the workflow - once when we stage the tiles initially, and again in the staged_to_3dtile(). Is that correct?

There is also a comment further down the workflow about this:

# Deduplicate & make leaf 3D tiles all staged tiles (only highest
# z-level).
# TODO: COMBINE WITH STEP 2, so we only read in and deduplicate
# each staged file once.

Update: After reviewing the README for the pdgstaging repo, it seems that some files (the polygons that overlap 2+ tiles) are indeed duplicated in the staged folder, and they are only deduplicated when we execute the function staged_to_3dtile().

The centroids of the polygons are assigned to only 1 tile (the centroids are never assigned to multiple tiles, because even if they fall on a tile boundary they are assigned to the SE tile), and this centroid tile assignment may differ from the polygon tile assignment if the polygon falls within 2+ tiles (these tile assignments are 2 separate properties assigned when we execute the staging step). When we execute staged_to_3dtile(), if the centroid tile assignment does not match the polygon tile assignment, then the polygon is removed from that tile.

julietcohen commented 2 years ago

I'm confused about a few differences between objects in the section where we define a bunch of variables:

workflow_config = '/home/jcohen/sample_data/ingmar-config__updated.json'
logging_config = '/home/jcohen/sample_data/logging.json'
batch_size_staging = 1
batch_size_rasterization = 30
batch_size_3dtiles = 20 # leaf tiles? higher resolution, more zoomed in, which is why we process fewer of them in a batch relative to the parent tiles
batch_size_parent_3dtiles = 500
batch_size_geotiffs = 200
batch_size_web_tiles = 200
julietcohen commented 2 years ago
robyngit commented 2 years ago

Early on in make_3d_tiles, we create a class StagedTo3DConverter() ... why is this pulling data from the staging folder?

Short answer: Because we create 3D tiles from vector data (e.g. shapesfiles or geopackage files). 3D tiles are just another format of vector data.

Long answer: You can think of the workflow as having three main parts: 1) Staging; 2) Rasterization; 3) 3D tiles. Rasterization and 3D tile creation are like two separate branches of the workflow that both start after the staging step (refer back to these slides). The staging step is where we do all of the tasks that are needed for both the subsequent rasterization and 3D tiles steps.

here's a diagram I made that now I am second-guessing ... I made a new schematic that is hopefully more correct

I tried correcting the first diagram here (please just ignore the part that says "Text" 🙃) 196817960-b0ecfe2f-5e99-46d2-a9de-aa63f447c3cd

But your second diagram is correct! 🎉

To answer questions from the second diagram:

depending on if we use rasterize_all or rasterize?

Here's a little bit more about how the lower z-level GeoTiffs are created during the rasterization step:

resampling

when do we use package py3dtiles?

StagedTo3DConverter uses the viz-3dtiles package, which uses the py3dtiles package. We could eventually re-organize how this works.

deduplication of files seems to be occurring twice in the workflow - once when we stage the tiles initially, and again in the staged_to_3dtile(). Is that correct?

This depends on what is set for the deduplicate_at config option. But for our purposes, we currently have it set to deduplicate at the start of 3D tiles AND at the start of Rasterization. This is because we want to archive the GeoPackage tiles (the output of staging) with all their original data, without deduplicated polygons removed. Otherwise, it would make more sense to just deduplicate once during staging. The comment you referenced suggests a third option: when the config is set to deduplicate at both staging and 3D tiles, that we should deduplicate just before passing the data to those two branches of the workflow (but after staging). This would require re-working how the parts of the workflow work together.

Update: After reviewing the README for the pdgstaging repo, it seems that some files (the polygons that overlap 2+ tiles) are indeed duplicated in the staged folder, and they are only deduplicated when we execute the function staged_to_3dtile().

The centroids of the polygons are assigned to only 1 tile (the centroids are never assigned to multiple tiles, because even if they fall on a tile boundary they are assigned to the SE tile), and this centroid tile assignment may differ from the polygon tile assignment if the polygon falls within 2+ tiles (these tile assignments are 2 separate properties assigned when we execute the staging step). When we execute staged_to_3dtile(), if the centroid tile assignment does not match the polygon tile assignment, then the polygon is removed from that tile.

Yes, this is all correct, but it's important to note that this is a different case of deduplication (confusing, I know). In this case, the data is duplicated because our workflow duplicated it during staging. We duplicate it to make sure that the rasterization step has access to polygons that overlap a tile even just a little bit, otherwise there will be weird edge effects in the resulting PNGs. However, we do not want identical polygons in the resulting 3D tiles, so we ALWAYS remove them at this step.

The OTHER deduplication which is configurable in the workflow is related to input files overlapping the same area. For the lakes dataset, where files overlap, the same lakes are detected twice, once in each file. Because the images that lakes are detected from differ a little, the same lakes won't give the exact same polygons, so the deduplication strategy is a little more complex. This part of the pdgstaging docs gives a detailed overview of these deduplication strategies.

what's the difference between 3d tiles and web tiles?

3D tiles are the Cesium 3D Tiles (b3dm & json), web tile are the image tiles we create just for showing in Cesium (PNG)

are the 3dtiles referenced in the object batch_size_3dtiles the leaf tiles?

Yes, the way we are making our 3D Tile tree is such that ONLY the leaf tiles have B3DM content. So we only show the Cesium 3D tiles when a user is very zoomed into the map. Parent tiles, in our case, are all JSON that references their child JSON or B3DM content.

are leaf tiles equivalent to child tiles?

Leaf tiles are children tiles. But we can have child tiles that are not leaf tiles. Page 2 of the Cesium 3D Tiles Reference card is a good reference here.

Tileset tree

how are the .B3DM files produced (by the p3dtiler package? but where is this in the workflow?) and where are they output? I see that we create the .json files (that I believe are the metadata for the .B3DM files) within the 3dtiles dir that is created by all_staged_to_3dtiles() but I do not see the actual .B3DM tiles that I believe are the actual visualizations on PDG

The .B3DM tiles are created by the staged_to_3dtile method, but you will only find them in the high z-level directory (11). Do you see them there?

julietcohen commented 2 years ago

@robyngit thanks for all that detailed feedback, your drawings and differentiation between the different types of deduplication and child vs leaf tiles are very helpful!

Yes, now I do see the .B3DM tiles that were created by the staged_to_3dtile method in the highest z-level directory, I had not checked the highest z-level folder last time, which is why I missed them and just saw the .json files. I'm looking in your 3dtile dir that was created when you ran through this sample data processing.

However, I am still working through producing the 3dtile dir myself, as I'm struggling to run the staged_to_3dtile. I'm trying to run only this part of the workflow, since I already staged and rasterized, so I have been trying to piece apart this pdg workflow that is one big script. I thought it would help my understanding to pull out the code for the exact step I am trying to do (creating the 3d tiles) but I'm running into errors that are making me question if it is worth it to parse the script and un-parallelize this step just to process this small sample dataset.

Since I finished the rasterization step, I tried jumping straight to tiles3dmaker.all_staged_to_3dtiles() but there was an error regarding the BoundingVolumeRegion not being defined, so went down a rabbit hole of trying to define that using tiles3dmaker.bounding_region_for_tile(path_to_one_highest_z_level_gpkg_file), and I also tried manually running parts of the code that makes up that function, but I have not been successful. I will keep at it but feel a little disheartened that taking apart the script into pieces is coming along slower than I imagined.

Each time I try to execute tiles3dmaker.all_staged_to_3dtiles(), it is helpful that an error is immediately output (rather than waiting for every file to process and only realize there was an error afterwards) but I am forced to interrupt the kernel by killing the process in the terminal since VS Code becomes unresponsive as it tries to process every staged file.

robyngit commented 2 years ago

BoundingVolumeRegion not being defined

BoundingVolumeRegion is a class that is imported in the viz-3dtiles library. It sounds like maybe you need the import that class at some point?

I will keep at it but feel a little disheartened that taking apart the script into pieces is coming along slower than I imagined.

Hopefully is helpful in learning the different parts of the script, but any time you are stuck, I'm happy to jump on a zoom call and help you debug! :)

julietcohen commented 2 years ago

I was able to resolve that error regarding BoundingVolumeRegion not being defined by importing both of the following:

import viz_3dtiles
from viz_3dtiles import TreeGenerator, BoundingVolumeRegion

Just importing viz_3dtiles didn't seem to do the trick, but that the second import line was found in StagedTo3DConverter.py. Thank you for the help, Robyn!

Thank you for offering to zoom to debug. I might take you up on that offer later today.

julietcohen commented 2 years ago

Update 10/24

Creating parent geotiffs for all z-levels

Since I am not working with batches, and am instead processing all files in one batch:

for z in parent_zs: 

    # Determine which tiles we need to make for the next z-level based on the
    # path names of the files just created
    child_paths = tile_manager.get_filenames_from_dir('geotiff', z=z + 1)
    parent_tiles = set()
    for child_path in child_paths:
        parent_tile = tile_manager.get_parent_tile(child_path)
        parent_tiles.add(parent_tile)
    parent_tiles = list(parent_tiles)

# Robyn explained here that I do not run the following function in a loop, that's just if we were iterating over many batches
    create_composite_geotiffs(tiles = parent_tiles, config = workflow_config, logging_dict = logging_dict)  

I am not sure if that actually created any new files. It seems that the geotiff dir is the same as it was before running that loop. But from what I can tell, my geotiff dir looks like Robyn's geotiff dir when she processed this data, so moving on.

Create web tiles from geotiffs

rasterizer.update_ranges() # not sure what this does

# create a file path for every .tiff within the `geotiff` dir, resulting in 7768 paths
geotiff_paths = tile_manager.get_filenames_from_dir('geotiff')

# create function for creating web tiles
def create_web_tiles(geotiff_paths, config, logging_dict=None):
    """
    Create a batch of webtiles from geotiffs (step 4)
    """
    import pdgraster
    if logging_dict:
        import logging.config
        logging.config.dictConfig(logging_dict)
    rasterizer = pdgraster.RasterTiler(config)
    return rasterizer.webtiles_from_geotiffs(
        geotiff_paths, update_ranges=False)

create_web_tiles(geotiff_paths, workflow_config, logging_dict)

This code ran fine. But similarly to the last step, it does not seem that any new files were created by this code. But it matches Robyn's web_tiles dir from what i can tell, so moving on.

It also produced a concerning output (even though it did not error). This output was repeated many times, specifically 373 considering the length of the workflow.ipynb output:

/home/jcohen/anaconda3/envs/pdgviz/lib/python3.10/site-packages/pdgraster/WebImage.py:110: RuntimeWarning: divide by zero encountered in double_scalars
  (255 / (max_val - min_val))

This might be the source of the reason I am not able to create a 3dtiles dir in the following steps. I need to look into the pdgraster library to determine this. I think it would be helpful to figure out what max_val and min_val are. I looked through the pdgraster repo but did not find the answer.

Deduplicate and make leaf 3D tiles all staged tiles (only highest z-level)

staged_paths = stager.tiles.get_filenames_from_dir('staged')

# define the function
def create_leaf_3dtiles(staged_paths, config, logging_dict=None):
    """
    Create a batch of leaf 3d tiles from staged vector tiles
    """
    #from pdg_workflow import StagedTo3DConverter
    if logging_dict:
        import logging.config
        logging.config.dictConfig(logging_dict)
    converter3d = StagedTo3DConverter(config)
    tilesets = []
    for path in staged_paths:
        ces_tile, ces_tileset = converter3d.staged_to_3dtile(path) # tiles3dmaker = converter3d if converter3d = StagedTo3DConverter(workflow_config)
        tilesets.append(ces_tileset)
    return tilesets

# apply the function 
create_leaf_3dtiles(staged_paths = staged_paths, config = workflow_config, logging_dict = logging_dict)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/jcohen/lake_change_sample/workflow.ipynb Cell 28 in <cell line: 1>()
----> [1](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) create_leaf_3dtiles(staged_paths = staged_paths, config = workflow_config, logging_dict = logging_dict)

/home/jcohen/lake_change_sample/workflow.ipynb Cell 28 in create_leaf_3dtiles(staged_paths, config, logging_dict)
     [10](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9) tilesets = []
     [11](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=10) for path in staged_paths:
---> [12](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11)     ces_tile, ces_tileset = converter3d.staged_to_3dtile(path) # tiles3dmaker = converter3d if converter3d = StagedTo3DConverter(workflow_config)
     [13](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=12)     tilesets.append(ces_tileset)
     [14](vscode-notebook-cell://ssh-remote%2Bdatateam.nceas.ucsb.edu/home/jcohen/lake_change_sample/workflow.ipynb#X44sdnNjb2RlLXJlbW90ZQ%3D%3D?line=13) return tilesets

TypeError: cannot unpack non-iterable NoneType object

ces_tile is the cesium3d tile, and ces_tileset is the json tileset product. I am not sure what's wrong with the code here. I thought the error might originate from the syntax, since I am not working with batches. I played with this code for a long time but ended up running what i believed is equivalent:

tiles3dmaker.all_staged_to_3dtiles()

While that function ran fine, it did not create a dir called 3dtiles like I expected. I also created a folder 3dtiles then re-ran that to see if it would populate the folder if it was already present, but it did not.

Moving on without this 3dtiles folder does not make sense, but I set up the following step to execute when I figure out how to create that folder.

Create parent cesium 3d tilesets for all z-levels (except highest):

max_z_tiles = [tile_manager.tile_from_path(path) for path in staged_paths]
# get the total bounds for all the tiles
max_z_bounds = [tile_manager.get_bounding_box(tile) for tile in max_z_tiles]
# get the total bounds for all the tiles
polygons = [box(bounds['left'],
                bounds['bottom'],
                bounds['right'],
                bounds['top']) for bounds in max_z_bounds]
max_z_bounds = gpd.GeoSeries(polygons, crs=tile_manager.tms.crs)

bound_volume_limit = max_z_bounds.total_bounds

# loop that reads from the 3dtiles folder that should have been created in the previous step:
for z in parent_zs:

    # Determine which tiles we need to make for the next z-level based on the
    # path names of the files just created
    all_child_paths = tiles3dmaker.tiles.get_filenames_from_dir('3dtiles', z=z + 1)

    parent_tiles = set()
    for child_path in all_child_paths:
        parent_tile = tile_manager.get_parent_tile(child_path)
        parent_tiles.add(parent_tile)
    parent_tiles = list(parent_tiles)

# define function
def create_parent_3dtiles(tiles, config, limit_bv_to=None, logging_dict=None):
    """
    Create a batch of cesium 3d tileset parent files that point to child
    tilesets
    """
    #from pdg_workflow import StagedTo3DConverter
    if logging_dict:
        import logging.config
        logging.config.dictConfig(logging_dict)
    converter3d = StagedTo3DConverter(config)
    return converter3d.parent_3dtiles_from_children(tiles, limit_bv_to)

# apply function
create_parent_3dtiles(parent_tiles, workflow_config, bound_volume_limit, logging_dict)

Output because there was no 3dstaging dir to read from: []

julietcohen commented 2 years ago

10/25 troubleshooting approach: use updated config file!

I was using an older version of the config file that was linked in the issue I'm following, but now I am trying the workflow with the updated config file found in Robyn's workflow. This was one of those realizations that can only happen after stepping away from the code for a night and returning in the morning.

robyngit commented 2 years ago

rasterizer.update_ranges() # not sure what this does ... I think it would be helpful to figure out what max_val and min_val are.

This is a really important step that does the following: 1) Opens up the raster summary CSV file 2) Calculates the min and max pixel value for across the entire z-level for each GeoTiff band 3) Updates the min and max value for each z-level in the config

We use the min and max pixel value for each z-level to map the entire range of values to the color palette when creating the PNG web tiles. If we were only to use the min and max within a tile, then the colors would not be mapped evenly across the layer. This could have something to do with the error you're seeing in web tile generation: divide by zero encountered in double_scalars (255 / (max_val - min_val)) - the min_val and max_val here are the min and max for the z-level. It sounds like you have the same min and max, giving '0', which is not expected.

If you ever want to know what one of the methods does, search for the method in the repo, or run: help(rasterizer.update_ranges). I did my best to document every method!

This might be the source of the reason I am not able to create a 3dtiles dir in the following steps.

Creating the 3D tiles is independent from creating the GeoTIFFs & webtiles. You could, for example, run the staging step and then run the 3D tiles steps, and skip rasterization all together. So the issues with 3D tiles is unrelated to the raster code.

cannot unpack non-iterable NoneType object

This sounds like staged_paths is None when you're passing it to create_leaf_3dtiles maybe?

the updated config file found in Robyn's workflow

Could you share the config you're using?

julietcohen commented 2 years ago

Great, thanks Robyn!

That makes sense that we want the min and max pixel values for each z-level to map the range of values with a color palette when creating the PNG web tiles. I agree that subtracting those values should not yield 0. I'm currently running the rasterization step again (after re-staging the files too). Hopefully re-generating the geotiff dir with the new config file will resolve that error.

Ah yes, good point that the staging dir is fed into the 3d tiles step, not the rasters, so the geotiff error is not affecting my ability to create 3d tiles.

I did check the object staged_paths, and it was indeed a list of all the staged files. I'll check this again today when I try that step after regenerating the tiles.

Here is the updated config I'm using (config__updated.json):

{
    "version": null,
    "dir_geotiff": "/home/jcohen/lake_change_sample/geotiff",
    "dir_web_tiles": "/home/jcohen/lake_change_sample/web_tiles",
    "dir_3dtiles": "/home/jcohen/lake_change_sample/3dtiles",
    "dir_staged": "/home/jcohen/lake_change_sample/staged",
    "dir_input": "/home/jcohen/lake_change_sample/input",
    "dir_footprints": "/home/jcohen/lake_change_sample/footprints",
    "filename_staging_summary": "/home/jcohen/lake_change_sample/staging_summary.csv",
    "filename_rasterization_events": "/home/jcohen/lake_change_sample/rasterization_events.csv",
    "filename_rasters_summary": "/home/jcohen/lake_change_sample/rasters_summary.csv",
    "filename_config": "/home/jcohen/lake_change_sample/config.json",
    "ext_web_tiles": ".png",
    "ext_input": ".shp",
    "ext_staged": ".gpkg",
    "ext_footprints": ".gpkg",
    "prop_centroid_x": "staging_centroid_x",
    "prop_centroid_y": "staging_centroid_y",
    "prop_area": "staging_area",
    "prop_tile": "staging_tile",
    "prop_centroid_tile": "staging_centroid_tile",
    "prop_filename": "staging_filename",
    "prop_identifier": "staging_identifier",
    "prop_centroid_within_tile": "staging_centroid_within_tile",
    "input_crs": null,
    "simplify_tolerance": 0.0001,
    "tms_id": "WorldCRS84Quad",
    "tile_path_structure": [
        "style",
        "tms",
        "z",
        "x",
        "y"
    ],
    "z_range": [
        0,
        11
    ],
    "tile_size": [
        256,
        256
    ],
    "statistics": [
        {
            "name": "polygon_count",
            "weight_by": "count",
            "property": "centroids_per_pixel",
            "aggregation_method": "sum",
            "resampling_method": "sum",
            "val_range": [
                0,
                null
            ],
            "nodata_val": 0,
            "nodata_color": "#ffffff00",
            "palette": "#d93fce",
            "z_config": {
                "0": {
                    "val_range": [
                        null,
                        4533.000000001244
                    ]
                },
                "1": {
                    "val_range": [
                        null,
                        1520.9999999982012
                    ]
                },
                "2": {
                    "val_range": [
                        null,
                        533.0000000026628
                    ]
                },
                "3": {
                    "val_range": [
                        null,
                        143.0000000016093
                    ]
                },
                "4": {
                    "val_range": [
                        null,
                        45.99999999881213
                    ]
                },
                "5": {
                    "val_range": [
                        null,
                        17.99999999962709
                    ]
                },
                "6": {
                    "val_range": [
                        null,
                        6.999999999865963
                    ]
                },
                "7": {
                    "val_range": [
                        null,
                        3.9999999996430233
                    ]
                },
                "8": {
                    "val_range": [
                        null,
                        1.9999999998981368
                    ]
                },
                "9": {
                    "val_range": [
                        null,
                        2.0
                    ]
                },
                "10": {
                    "val_range": [
                        null,
                        2.0
                    ]
                },
                "11": {
                    "val_range": [
                        null,
                        1.0
                    ]
                }
            }
        },
        {
            "name": "coverage",
            "weight_by": "area",
            "property": "area_per_pixel_area",
            "aggregation_method": "sum",
            "resampling_method": "average",
            "val_range": [
                0,
                1
            ],
            "nodata_val": 0,
            "nodata_color": "#ffffff00",
            "palette": "#d93fce"
        }
    ],
    "geometricError": null,
    "z_coord": 0,
    "deduplicate_at": [
        "raster", "3dtiles"
    ],
    "deduplicate_method": "neighbor",
    "deduplicate_keep_rules": [
        [
            "staging_filename",
            "larger"
        ]
    ],
    "deduplicate_overlap_tolerance": 0.1,
    "deduplicate_overlap_both": false,
    "deduplicate_centroid_tolerance": null,
    "deduplicate_distance_crs": "EPSG:3857",
    "deduplicate_clip_to_footprint": false,
    "deduplicate_clip_method": "within"
}

The only changes I made to this were the palette in 2 places and file paths at the top.

robyngit commented 2 years ago

Thanks for sharing the config @julietcohen! This version, with the suffix __updated in the filename, is actually created by the workflow. It includes the min and max pixel values for each z-level that were calculated during the run. It's better to start without those in case you are not using the exact same data.

In the pdg-portal issue, I linked to the config file that I used: ingmar-config.json.txt

The palette that you are configuring is also not valid, see the doc string in ConfigManager. Palette needs to be either the name of a color palette available in the Colormaps library or a list of color strings in any format accepted by the coloraide library. Maybe try using two colors for your palette, for example ["#d9c43f", "#d93fce"]

julietcohen commented 2 years ago

I did use that config file you linked, ingmar-config.json.txt, when I ran this workflow the first time, so I was using the right one. I did not realize that a new __updated.json file was created by the workflow, but I see why.

So I should use ingmar-config.json for the staging and rasterization, then switch to the generated workflow_config = ingmar-config__updated.json for creating the 3dtiles like you did in make-3d-tiles.py.

I did try a few different palettes with 2 colors before I settled on the single color in my config file because it did not error. My syntax must have been wrong, because using 2 values produced an error, but that certainly is correct according to the documentation you linked. I'll choose a valid palette and re-run the staging and rasterization steps with the original config file.

Thanks for the help, as always!

robyngit commented 2 years ago

then switch to the generated workflow_config = ingmar-config__updated.json for creating the 3dtiles like you did in make-3d-tiles.py

You shouldn't need to switch to the __updated.json config at any time really. If for some reason you had to re-generate the PNG web tiles, then using the __updated config would be faster because you would not need to run rasterizer.update_ranges() which takes some time. For the 3D tiles, you should also use the same ingmar-config.json file.

I did try a few different palettes with 2 colors before I settled on the single color in my config file because it did not error.

It's interesting you didn't get an error! I would expect that you'd at least have to put your color in a list, but I haven't tested this 🤷🏻

julietcohen commented 2 years ago

Ohh got it. I see that by running rasterize_all(), we are updating the ranges when creating the webtiles because that wraps around both rasterize_vectors() and webtiles_from_all_geotiffs(), and the latter has the default update_ranges=True. I'm relieved to make the connection between the config file and the the ranges. Things are really coming together a lot more with the file PermafrostDiscoveryGateway/viz-raster/pdgraster/RasterTiler.py that you pointed me to 🤯

julietcohen commented 2 years ago

Perhaps it would be helpful to add in an error message if the user inputs only 1 color for the palette in the config like I had. I'll note that in the issue I made for the palette in case we want to implement it.

julietcohen commented 2 years ago

I ended yesterday trying to produce the 3d tiles dir by running the staging, rasterization, and 3d tiles steps from start to finish using ingmar-config.json, and with batching in the 3d tile step as Robyn did in her script make-3d-tiles.py, even though I did not suspect that batching would make a difference in that step. The result was no errors, but also no 3d tiles dir produced. It doesn't seem like the function to create 3d tiles finished after I left it running at the end of the day, maybe the server connection was lost and cancelled the process because when I returned to the script in the morning there was no checkmark and runtime stamp for the chunk, and the server connection was gone.

This morning I noticed that the config file ingmar-config.json that I am supposed to use differs in a few ways from the object defined as config in Robyn's script make-tiles.py, where she executes the staging step. The differences are: config ingmar-config.json
uses parenthesis 'z_range': (0, 11) uses brackets "z_range": [0, 11]
specifies tile size 'tile_size': (256, 256) doesn't specify tile size
val_range is from 0 through None 'val_range': [0, None] val_range is from 0 through null "val_range": [0, null]
deduplicates at raster step 'deduplicate_at': ['raster'] deduplicates at staging step "deduplicate_at": ["staging"]
capitalized F 'deduplicate_overlap_both': False uncapitalized f "deduplicate_overlap_both": false
None instead of null 'deduplicate_centroid_tolerance': None null instead of none "deduplicate_centroid_tolerance": null

I am wondering if the 'deduplicate_at': ['raster'] versus ["staging"] is especially important here. I also checked out the __updated.json config file that was produced and saved in Robyn's folder for her run through of this data sample. That file has "deduplicate_at": ["raster", "3dtiles"], which is confusing because this__updated.json file would have changed the options here, meaning that the original config file that was the template for this __updated.json file would have also been set to "deduplicate_at": ["raster", "3dtiles"]. I suppose that the presence of a __updated.json file does not imply that the original file it is based on was actually used for the config, since she defined the config object within the script.

robyngit commented 2 years ago

The differences that you are seeing between the config object and the config json file are just differences in the syntax for python (config object) vs JSON (ingmar-config.json file).

The following are how you translate between python and JSON:

python JSON
None null
() or [] []
False false
True true

If you read the JSON file into python, it will be parsed as a dict object, and all of these translations will be done for you.

import json

with open('path_to_file/ingmar-config.json', 'r') as f:
  config = json.load(f)

print(config)

For the tile size that I specified in one and not the other: 256x256 is the default, so it makes no difference here. There was no reason to specify it in the config object.

You can see the default config options like this:

from pdgstaging import ConfigManager

print(ConfigManager.defaults)
julietcohen commented 2 years ago

Awesome, thanks for clarifying!