ungarj / mapchete

Tile-based geodata processing using rasterio & Fiona
MIT License
195 stars 28 forks source link

Processing openstreetmap tiles #642

Open OlgaGKononova opened 1 month ago

OlgaGKononova commented 1 month ago

Hi,

I am trying to use mapchete to process tiles from openstreetmap which have tile-directory-like structure: z/x/y.png. As far as I understand, mapchete can read remote directly without problem, and I confirmed it by reading remotely located tif files. However, openstreetmap tiles are pngs and it seems as mapchete (specifically rasterio) cannot process png files without additional georeference arguments. I can feed png with georef arguments to pure rasterio.open() function (and it reads them correctly), but it looks like none of the arguments can be passed for rasterio reading functions within mapchete. I tried feeding to mapchete a local metadata.json (after manually tweaking the code) - no success.

1) I wonder if it would be in general possible to read openstreetmap as a tiles directory. 2) Would it be possible to either have metadata.json as a parameter in config (instead of being hardcoded) or have arguments for rasterio to be read from config file.

This is the config I am currently using:

process: example_process.py
zoom_levels:
    min: 7
    max: 8
pyramid:
    grid: geodetic
    metatiling: 2
    pixelbuffer: 4
    crs: EPSG:4326
input:
    file1:
        format: TileDirectory
        path: /vsicurl/https://a.tile.openstreetmap.org/  -> both with and without /vsicurl/ results are same
        type: geodetic
        grid: geodetic  -> I have to add this parameters otherwise the code complains about pyramid structure.
        extension: png
        dtype: uint8
        bounds: [7, 46, 9, 47]
        count: 4
output:
    type: geodetic
    path: ./test_out
    format: PNG
    dtype: uint8
    bands: 4
bounds: [7, 46, 9, 47]

I run this config with the native example_process.py process file. raster_file is not empty upon opening, but after reading the returned masked array is empty and no output files are written. When run with debug options I get the following:

2024-08-16 15:52:18,663 DEBUG mapchete.io.raster.read skip missing file https://a.tile.openstreetmap.org/7/32/135.png
2024-08-16 15:52:18,664 DEBUG mapchete.path using GDAL options: {'CPL_VSIL_CURL_ALLOWED_EXTENSIONS': '.png', 'CPL_VSIL_CURL_CACHE_SIZE': 200000000, 'GDAL_BAND_BLOCK_CACHE': 'HASHSET', 'GDAL_DISABLE_READDIR_ON_OPEN': 'EMPTY_DIR', 'GDAL_HTTP_TIMEOUT': 30, 'GDAL_HTTP_MAX_RETRY': 3, 'GDAL_HTTP_MERGE_CONSECUTIVE_RANGES': True, 'GDAL_HTTP_MULTIPLEX': True, 'GDAL_HTTP_RETRY_DELAY': 5, 'GDAL_HTTP_VERSION': 2, 'VSI_CACHE': True, 'VSI_CACHE_SIZE': 5000000}
2024-08-16 15:52:18,665 DEBUG mapchete.io.raster.read reading https://a.tile.openstreetmap.org/7/32/136.png with GDAL options {'CPL_VSIL_CURL_ALLOWED_EXTENSIONS': '.png', 'CPL_VSIL_CURL_CACHE_SIZE': 200000000, 'GDAL_BAND_BLOCK_CACHE': 'HASHSET', 'GDAL_DISABLE_READDIR_ON_OPEN': 'EMPTY_DIR', 'GDAL_HTTP_TIMEOUT': 30, 'GDAL_HTTP_MAX_RETRY': 3, 'GDAL_HTTP_MERGE_CONSECUTIVE_RANGES': True, 'GDAL_HTTP_MULTIPLEX': True, 'GDAL_HTTP_RETRY_DELAY': 5, 'GDAL_HTTP_VERSION': 2, 'VSI_CACHE': True, 'VSI_CACHE_SIZE': 5000000}
2024-08-16 15:52:22,404 DEBUG mapchete.io.raster.read skip missing file https://a.tile.openstreetmap.org/8/64/266.png
[]
2024-08-16 15:52:22,406 DEBUG mapchete.processing.execute ('tile_task_z8-(8-31-132)', 'processed successfully')
2024-08-16 15:52:22,415 DEBUG mapchete.formats.default.png data empty, nothing to write
2024-08-16 15:52:22,415 DEBUG mapchete.processing.execute (TileIndex(zoom=8, row=31, col=132), 'output written in 0.007s')
2024-08-16 15:52:22,513 DEBUG mapchete.io.raster.read skip missing file https://a.tile.openstreetmap.org/8/64/268.png

This is a snipped. I can send the full log if needed.

Thank you for your help.

Scartography commented 3 weeks ago

Hi @OlgaGKononova,

I have tried to write a custom process that would convert the OSM http://tile.openstreetmap.org/ to a mapchete mosaic first, but have failed to fetch the data via script due to: https://operations.osmfoundation.org/policies/tiles/ which prohibits such a behaviour:

Bulk downloading (“scraping”) is the downloading of tiles in advance instead of downloading when a user views those tiles. Common examples include creating a tile archive or downloading for offline usage. Bulk downloading is prohibited. These tiles are generally not cached on the server in advance and have to be rendered specifically for those requests, putting an unjustified burden on the available resources.

Which caused:

mapchete.errors.MapcheteTaskFailed: <MFuture: type: <class 'NoneType'>, exception: <class 'FileNotFoundError'>, profiling: {}) raised a FileNotFoundError('https://tile.openstreetmap.org/8/141/84.png')

and with wget:

$ wget http://tile.openstreetmap.org/8/141/84.png

Connecting to tile.openstreetmap.org (tile.openstreetmap.org)|2a04:4e42:41::347|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2024-08-22 17:26:33 ERROR 403: Forbidden.

Having said that my configuration, that might have worked would be first from the osm server to (GeoTIFF mosaic) and then whatever format etc.

mapchete config:

process: osm_test.py
zoom_levels:
    min: 7
    max: 8

input:
    file1:
        format: TileDirectory
        path: https://tile.openstreetmap.org/
        grid: mercator  # the osm tiles are in mercator so let's start with just copying/scrapping them
        metatiling: 1
        extension: png
        dtype: uint8
        # bounds: [-20037508.342789, -20037508.342789, 20037508.342789, 20037508.342789]
        count: 4
output:
    path: ./test_out
    format: PNG # GTiff would be prefered
    dtype: uint8
    bands: 4
pyramid:
    grid: mercator
    metatiling: 1
# need mercator bounds here 
bounds: [2150432.278, 6797973.399, 2154983.440, 6801386.771]

mapchete process (osm_test.py):


import numpy as np
from PIL import Image
from tempfile import TemporaryDirectory

from mapchete import MPath
from mapchete.io import copy as mp_copy

def execute(mp):
    """User defined process."""
    # Reading and writing data works like this:
    with mp.open("file1") as raster_file:
        # OSM has inverted row and col than mapchete so invert
        in_osm_tile_path = MPath(raster_file._basepath).joinpath(
            f"{raster_file.tile.zoom}/{raster_file.tile.col}/{raster_file.tile.row}.{raster_file._ext}"
        )
        with TemporaryDirectory() as tmp_dir:
            local_read_file = MPath(f"{tmp_dir}/{raster_file.tile.zoom}/{raster_file.tile.col}/{raster_file.tile.row}.{raster_file._ext}")
            mp_copy(in_osm_tile_path, local_read_file)
            im_frame = Image.open(str(local_read_file))
            np_frame = np.array(im_frame.getdata())            
    return np_frame

Also mapchete core code natively does not support reading of PNGs yet.

Should you find another datasource or need help with other things here feel free to open other issues.

I will leave this open for a bit so you have a chance to read this response, if this is enough close/reopen this issue at your convince.

Petr