Support arbitrary scales/grids, not just quads

kylebarron commented 4 years ago

Crosstab of grid size vs scale:

pd.crosstab(gdf['Scale'], gdf['Grid Size'], margins=True).to_markdown()

Scale	1 X 1 Degree	1 X 2 Degree	1 X 3 Degree	1 X 4 Degree	15 X 15 Minute	2 X 1 Degree	3.75 X 3.75 Minute	30 X 30 Minute	30 X 60 Minute	7.5 X 15 Minute	7.5 X 7.5 Minute	None	All
10000	0	0	0	0	0	0	76	0	0	0	1	0	77
12000	0	0	0	0	0	0	1	0	0	0	0	0	1
20000	0	0	0	0	3	0	0	0	0	0	295	0	298
21120	0	0	0	0	0	0	0	0	0	0	19	0	19
24000	0	0	0	0	4	0	0	0	0	0	131183	0	131187
25000	0	0	0	0	0	0	0	0	0	389	1157	2	1548
30000	0	0	0	0	0	0	0	0	0	0	276	0	276
31680	0	0	0	0	5	0	0	0	0	0	4273	0	4278
48000	0	0	0	0	1034	0	0	0	0	7	90	0	1131
50000	0	0	0	0	94	0	0	0	0	0	0	0	94
62500	0	0	0	0	27997	0	0	0	0	0	2	0	27999
63360	0	0	0	0	7082	0	0	0	0	0	0	0	7082
96000	0	0	0	0	1	0	0	42	0	0	0	0	43
100000	0	0	0	0	0	0	0	8	2963	0	0	0	2971
125000	0	0	0	0	0	0	0	4657	1	0	0	0	4658
192000	0	0	0	0	0	0	0	2	0	0	0	0	2
250000	495	3021	945	20	0	34	0	10	0	0	0	0	4525
All	495	3021	945	20	36220	34	77	4719	2964	396	137296	2	186189

kylebarron commented 4 years ago

Grid size - Scale combinations:

[['7.5 X 7.5 Minute', 10000],
 ['7.5 X 7.5 Minute', 20000],
 ['7.5 X 7.5 Minute', 21120],
 ['7.5 X 7.5 Minute', 24000],
 ['7.5 X 7.5 Minute', 25000],
 ['7.5 X 7.5 Minute', 30000],
 ['7.5 X 7.5 Minute', 31680],
 ['7.5 X 7.5 Minute', 48000],
 ['7.5 X 7.5 Minute', 62500],
 ['3.75 X 3.75 Minute', 10000],
 ['3.75 X 3.75 Minute', 12000],
 ['15 X 15 Minute', 20000],
 ['15 X 15 Minute', 24000],
 ['15 X 15 Minute', 31680],
 ['15 X 15 Minute', 48000],
 ['15 X 15 Minute', 50000],
 ['15 X 15 Minute', 62500],
 ['15 X 15 Minute', 63360],
 ['15 X 15 Minute', 96000],
 ['7.5 X 15 Minute', 25000],
 ['7.5 X 15 Minute', 48000],
 ['None', 25000],
 ['30 X 30 Minute', 96000],
 ['30 X 30 Minute', 100000],
 ['30 X 30 Minute', 125000],
 ['30 X 30 Minute', 192000],
 ['30 X 30 Minute', 250000],
 ['30 X 60 Minute', 100000],
 ['30 X 60 Minute', 125000],
 ['1 X 2 Degree', 250000],
 ['1 X 3 Degree', 250000],
 ['1 X 1 Degree', 250000],
 ['1 X 4 Degree', 250000],
 ['2 X 1 Degree', 250000]]

Reproduction code:

    df = pd.read_csv(path)

    # Keep only historical maps
    # Newer maps are only in GeoPDF, and not in GeoTIFF, let alone COG
    df = df[df['Series'] == 'HTMC']

    df['geometry'] = df.apply(construct_geometry, axis=1)
    gdf = gpd.GeoDataFrame(df)

# or just use df here
grid_sizes = gdf['Grid Size'].unique()
grid_size_scales_pairs = []
for grid_size in grid_sizes:
    grid_size_df = gdf[gdf['Grid Size'] == grid_size]
    scales = grid_size_df['Scale'].unique()
    for scale in scales:
        grid_size_scales_pairs.append([grid_size, scale])

kylebarron commented 4 years ago

Code to visualize each combination:

cols = ['Map Name',
 'Scale',
 'geometry',
 'W Long',
 'S Lat',
 'E Long',
 'N Lat',
 'Download Product S3']
for grid_size, scale in grid_size_scales_pairs:
    if grid_size == 'None':
        continue

    gdf_part = gdf.loc[(gdf['Grid Size'] == grid_size) & (gdf['Scale'] == scale)]
    Visualize(gdf_part[cols])

kylebarron commented 4 years ago

To generate automated test cases:

test_cases = []
n = 3
for grid_size, scale in grid_size_scales_pairs:
    if grid_size == 'None':
        continue

    gdf_part = gdf.loc[(gdf['Grid Size'] == grid_size) & (gdf['Scale'] == scale)]

    # Choose random 3
    choose_n = min(len(gdf_part), n)

    sample = gdf_part.sample(choose_n)

    urls = sample['Download Product S3']
    urls = urls.str.replace('PDF', 'GeoTIFF')
    urls = urls.str.replace('.pdf', '.tif')

    # Remove scale folder; not used with geotiff
    urls = urls.str.replace(f'{scale}/', '', n=1)

    # Df of bounds
    bounds_df = sample.geometry.bounds.values

    for url, bounds in zip(urls, bounds_df):
        test_cases.append([url, list(bounds), scale, grid_size])

kylebarron commented 4 years ago

I don't think there will ever be a non-hacky way to discover the image's true map bounds on the fly. For example here in northern Alaska, the right border is so wide that it takes up the space of 1 degree of longitude (which in northern Alaska is relatively small).

So I think the best way forward is to "hack" the asset string to be encoded JSON, which then has "url" and "mapBounds" keys internally, so no parsing will be needed.

kylebarron / usgs-topo-tiler

Support arbitrary scales/grids, not just quads #8