Generated using DALLยทE 3 model with this prompt: A logo for a python library with White background, high quality, 8k. Cute duck and globe with cartography elements. Library for reading OpenStreetMap data using DuckDB.
An open-source tool for reading OpenStreetMap PBF files using DuckDB.
pbf
) files.DuckDB
[^1] with its Spatial
[^2] extension.GeoParquet
[^3] file format for easier integration with modern cloud stacks.ogr2ogr
clipping before operation.Typer
[^4].[^1]: DuckDB Website [^2]: DuckDB Spatial extension repository [^3]: GeoParquet data format [^4]: Typer docs
pip install quackosm
pip install quackosm[cli]
QuackOSM supports Python >= 3.9
Required:
duckdb (>=1.1.0)
: For all DuckDB operations on PBF files
pyarrow (>=16.0.0)
: For parquet files wrangling
geoarrow-pyarrow (>=0.1.2)
: For GeoParquet IO operations and transforming Arrow data to Shapely objects
geoarrow-pandas (>=0.1.1)
: For GeoParquet integration with GeoPandas
geopandas (>=0.6)
: For returning GeoDataFrames and reading Geo files
shapely (>=2.0)
: For parsing WKT and GeoJSON strings and fixing geometries
polars (>=0.19.4)
: For faster OSM ways grouping operation
typeguard (>=3.0)
: For internal validation of types
psutil (>=5.6.2)
: For automatic scaling of parameters based on available resources
pooch (>=1.6.0)
: For downloading *.osm.pbf
files
rich (>=12.0.0)
& tqdm (>=4.42.0)
: For showing progress bars
requests
: For iterating OSM PBF files services
beautifulsoup4
: For parsing HTML files and scraping required information
geopy (>=2.0.0)
: For geocoding of strings
Optional:
typer[all] (>=0.9.0)
(click, colorama, rich, shellingham): Required in CLI
h3 (>=4.0.0b1)
: For reading H3 strings. Required in CLI
s2 (>=0.1.9)
: For transforming S2 indexes into geometries. Required in CLI
python-geohash (>=0.8)
: For transforming GeoHash indexes into geometries. Required in CLI
>>> import quackosm as qosm
>>> qosm.convert_pbf_to_geodataframe(monaco_pbf_path)
tags geometry
feature_id
node/10005045289 {'shop': 'bakery'} POINT (7.42245 43.73105)
node/10020887517 {'leisure': 'swimming_pool', ... POINT (7.41316 43.73384)
node/10021298117 {'leisure': 'swimming_pool', ... POINT (7.42777 43.74277)
node/10021298717 {'leisure': 'swimming_pool', ... POINT (7.42630 43.74097)
node/10025656383 {'ferry': 'yes', 'name': 'Qua... POINT (7.42550 43.73690)
... ... ...
way/990669427 {'amenity': 'shelter', 'shelt... POLYGON ((7.41461 43.7338...
way/990669428 {'highway': 'secondary', 'jun... LINESTRING (7.41366 43.73...
way/990669429 {'highway': 'secondary', 'jun... LINESTRING (7.41376 43.73...
way/990848785 {'addr:city': 'Monaco', 'addr... POLYGON ((7.41426 43.7339...
way/993121275 {'building': 'yes', 'name': ... POLYGON ((7.43214 43.7481...
[7906 rows x 2 columns]
>>> import quackosm as qosm
>>> gpq_path = qosm.convert_pbf_to_parquet(monaco_pbf_path)
>>> gpq_path.as_posix()
'files/monaco_nofilter_noclip_compact.parquet'
>>> import duckdb
>>> duckdb.load_extension('spatial')
>>> duckdb.read_parquet(str(gpq_path)).order("feature_id")
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ feature_id โ tags โ geometry โ
โ varchar โ map(varchar, varchโฆ โ geometry โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ node/10005045289 โ {shop=bakery} โ POINT (7.4224498 43.7310532) โ
โ node/10020887517 โ {leisure=swimming_โฆ โ POINT (7.4131561 43.7338391) โ
โ node/10021298117 โ {leisure=swimming_โฆ โ POINT (7.4277743 43.7427669) โ
โ node/10021298717 โ {leisure=swimming_โฆ โ POINT (7.4263029 43.7409734) โ
โ node/10025656383 โ {ferry=yes, name=Qโฆ โ POINT (7.4254971 43.7369002) โ
โ node/10025656390 โ {amenity=restauranโฆ โ POINT (7.4269287 43.7368818) โ
โ node/10025656391 โ {name=Capitainerieโฆ โ POINT (7.4272127 43.7359593) โ
โ node/10025656392 โ {name=Direction deโฆ โ POINT (7.4270392 43.7365262) โ
โ node/10025656393 โ {name=IQOS, openinโฆ โ POINT (7.4275175 43.7373195) โ
โ node/10025656394 โ {artist_name=Anna โฆ โ POINT (7.4293446 43.737448) โ
โ ยท โ ยท โ ยท โ
โ ยท โ ยท โ ยท โ
โ ยท โ ยท โ ยท โ
โ way/986864693 โ {natural=bare_rock} โ POLYGON ((7.4340482 43.745598, 7.4340263 4โฆ โ
โ way/986864694 โ {barrier=wall} โ LINESTRING (7.4327547 43.7445382, 7.432808โฆ โ
โ way/986864695 โ {natural=bare_rock} โ POLYGON ((7.4332994 43.7449315, 7.4332912 โฆ โ
โ way/986864696 โ {barrier=wall} โ LINESTRING (7.4356006 43.7464325, 7.435574โฆ โ
โ way/986864697 โ {natural=bare_rock} โ POLYGON ((7.4362767 43.74697, 7.4362983 43โฆ โ
โ way/990669427 โ {amenity=shelter, โฆ โ POLYGON ((7.4146087 43.733883, 7.4146192 4โฆ โ
โ way/990669428 โ {highway=secondaryโฆ โ LINESTRING (7.4136598 43.7334433, 7.413640โฆ โ
โ way/990669429 โ {highway=secondaryโฆ โ LINESTRING (7.4137621 43.7334251, 7.413746โฆ โ
โ way/990848785 โ {addr:city=Monaco,โฆ โ POLYGON ((7.4142551 43.7339622, 7.4143113 โฆ โ
โ way/993121275 โ {building=yes, namโฆ โ POLYGON ((7.4321416 43.7481309, 7.4321638 โฆ โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 7906 rows (20 shown) 3 columns โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
$ quackosm monaco.osm.pbf
โ [ 1/32] Reading nodes โข 0:00:00
โ [ 2/32] Filtering nodes - intersection โข 0:00:00
โ [ 3/32] Filtering nodes - tags โข 0:00:00
โ [ 4/32] Calculating distinct filtered nodes ids โข 0:00:00
โ [ 5/32] Reading ways โข 0:00:00
โ [ 6/32] Unnesting ways โข 0:00:00
โ [ 7/32] Filtering ways - valid refs โข 0:00:00
โ [ 8/32] Filtering ways - intersection โข 0:00:00
โ [ 9/32] Filtering ways - tags โข 0:00:00
โ [ 10/32] Calculating distinct filtered ways ids โข 0:00:00
โ [ 11/32] Reading relations โข 0:00:00
โ [ 12/32] Unnesting relations โข 0:00:00
โ ธ [ 13/32] Filtering relations - valid refs โข 0:00:00
โ [ 14/32] Filtering relations - intersection โข 0:00:00
โ [ 15/32] Filtering relations - tags โข 0:00:00
โ [ 16/32] Calculating distinct filtered relations ids โข 0:00:00
โ [ 17/32] Loading required ways - by relations โข 0:00:00
โ [ 18/32] Calculating distinct required ways ids โข 0:00:00
โ [ 19/32] Saving filtered nodes with geometries โข 0:00:00
โ [20.1/32] Grouping filtered ways - assigning groups โข 0:00:00
โ ง [20.2/32] Grouping filtered ways - joining with nodes โข 0:00:00
โ [20.3/32] Grouping filtered ways - partitioning by group โข 0:00:00
[ 21/32] Saving filtered ways with linestrings 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1/1 โข 0:00:00 < 0:00:00 โข
โ [22.1/32] Grouping required ways - assigning groups โข 0:00:00
โ ง [22.2/32] Grouping required ways - joining with nodes โข 0:00:00
โ [22.3/32] Grouping required ways - partitioning by group โข 0:00:00
[ 23/32] Saving required ways with linestrings 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1/1 โข 0:00:00 < 0:00:00 โข
โ [ 24/32] Saving filtered ways with geometries โข 0:00:00
โ [ 25/32] Saving valid relations parts โข 0:00:00
โ [ 26/32] Saving relations inner parts โข 0:00:00
โ [ 27/32] Saving relations outer parts โข 0:00:00
โ [ 28/32] Saving relations outer parts with holes โข 0:00:00
โ [ 29/32] Saving relations outer parts without holes โข 0:00:00
โ [ 30/32] Saving filtered relations with geometries โข 0:00:00
โ [ 31/32] Saving all features โข 0:00:00
โ [ 32/32] Saving final geoparquet file โข 0:00:00
Finished operation in 0:00:03
files/monaco_nofilter_noclip_compact.parquet
>>> import quackosm as qosm
>>> geometry = qosm.geocode_to_geometry("Vatican City")
>>> qosm.convert_geometry_to_geodataframe(geometry)
tags geometry
feature_id
node/10253371713 {'crossing': 'uncontrolled',... POINT (12.45603 41.90454)
node/10253371714 {'highway': 'stop'} POINT (12.45705 41.90400)
node/10253371715 {'highway': 'stop'} POINT (12.45242 41.90164)
node/10253371720 {'artwork_type': 'statue',... POINT (12.45147 41.90484)
node/10253371738 {'natural': 'tree'} POINT (12.45595 41.90609)
... ... ...
way/983015528 {'barrier': 'hedge', 'height'... POLYGON ((12.45027 41.901...
way/983015529 {'barrier': 'hedge', 'height'... POLYGON ((12.45028 41.901...
way/983015530 {'barrier': 'hedge', 'height'... POLYGON ((12.45023 41.901...
way/998561138 {'barrier': 'bollard', 'bicyc... LINESTRING (12.45821 41.9...
way/998561139 {'barrier': 'bollard', 'bicyc... LINESTRING (12.45828 41.9...
[3286 rows x 2 columns]
>>> import quackosm as qosm
>>> from shapely import from_wkt
>>> geometry = from_wkt(
... "POLYGON ((14.4861 35.9107, 14.4861 35.8811, 14.5331 35.8811, 14.5331 35.9107, 14.4861 35.9107))"
... )
>>> gpq_path = qosm.convert_geometry_to_parquet(geometry)
>>> gpq_path.as_posix()
'files/4b2967088a8fe31cdc15401e29bff9b7b882565cd8143e90443f39f2dc5fe6de_nofilter_compact.parquet'
>>> import duckdb
>>> duckdb.load_extension('spatial')
>>> duckdb.read_parquet(str(gpq_path)).order("feature_id")
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ feature_id โ tags โ geometry โ
โ varchar โ map(varchar, varchโฆ โ geometry โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ node/10001388317 โ {amenity=bench, baโฆ โ POINT (14.5093988 35.8936881) โ
โ node/10001388417 โ {amenity=bench, baโฆ โ POINT (14.5094635 35.8937135) โ
โ node/10001388517 โ {amenity=bench, baโฆ โ POINT (14.5095215 35.8937305) โ
โ node/10018287160 โ {opening_hours=Mo-โฆ โ POINT (14.5184916 35.8915925) โ
โ node/10018287161 โ {defensive_works=bโฆ โ POINT (14.5190093 35.8909471) โ
โ node/10018287162 โ {defensive_works=hโฆ โ POINT (14.5250094 35.8883199) โ
โ node/10018742746 โ {defibrillator:locโฆ โ POINT (14.5094082 35.8965151) โ
โ node/10018742747 โ {amenity=bank, namโฆ โ POINT (14.51329 35.8991614) โ
โ node/10032244899 โ {amenity=restauranโฆ โ POINT (14.4946298 35.8986226) โ
โ node/10034853491 โ {amenity=pharmacy} โ POINT (14.4945884 35.9012758) โ
โ ยท โ ยท โ ยท โ
โ ยท โ ยท โ ยท โ
โ ยท โ ยท โ ยท โ
โ way/884730763 โ {highway=footway, โฆ โ LINESTRING (14.5218277 35.8896022, 14.5218โฆ โ
โ way/884730764 โ {bridge=yes, highwโฆ โ LINESTRING (14.5218054 35.8896015, 14.5218โฆ โ
โ way/884730765 โ {highway=footway, โฆ โ LINESTRING (14.5204069 35.889924, 14.52044โฆ โ
โ way/884730766 โ {handrail=yes, higโฆ โ LINESTRING (14.5204375 35.8898663, 14.5204โฆ โ
โ way/884730767 โ {access=yes, handrโฆ โ LINESTRING (14.5196113 35.8906142, 14.5196โฆ โ
โ way/884730768 โ {highway=steps, laโฆ โ LINESTRING (14.5197226 35.890676, 14.51972โฆ โ
โ way/884730769 โ {access=yes, handrโฆ โ LINESTRING (14.5197184 35.8906707, 14.5197โฆ โ
โ way/884738591 โ {highway=pedestriaโฆ โ LINESTRING (14.5204163 35.8897296, 14.5204โฆ โ
โ way/884744870 โ {highway=residentiโฆ โ LINESTRING (14.5218931 35.8864046, 14.5221โฆ โ
โ way/884744871 โ {access=yes, handrโฆ โ LINESTRING (14.5221083 35.8864287, 14.5221โฆ โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ? rows (>9999 rows, 20 shown) 3 columns โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
$ quackosm --geom-filter-geocode "Shibuya, Tokyo"
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 46.3M/46.3M [00:00<00:00, 327GB/s]
โ [ 1/32] Reading nodes โข 0:00:01
โ น [ 2/32] Filtering nodes - intersection โข 0:00:00
โ [ 3/32] Filtering nodes - tags โข 0:00:00
โ [ 4/32] Calculating distinct filtered nodes ids โข 0:00:00
โ ธ [ 5/32] Reading ways โข 0:00:03
โ ด [ 6/32] Unnesting ways โข 0:00:01
โ ผ [ 7/32] Filtering ways - valid refs โข 0:00:00
โ น [ 8/32] Filtering ways - intersection โข 0:00:00
โ [ 9/32] Filtering ways - tags โข 0:00:00
โ [ 10/32] Calculating distinct filtered ways ids โข 0:00:00
โ ผ [ 11/32] Reading relations โข 0:00:00
โ ธ [ 12/32] Unnesting relations โข 0:00:00
โ [ 13/32] Filtering relations - valid refs โข 0:00:00
โ [ 14/32] Filtering relations - intersection โข 0:00:00
โ [ 15/32] Filtering relations - tags โข 0:00:00
โ [ 16/32] Calculating distinct filtered relations ids โข 0:00:00
โ [ 17/32] Loading required ways - by relations โข 0:00:00
โ [ 18/32] Calculating distinct required ways ids โข 0:00:00
โ น [ 19/32] Saving filtered nodes with geometries โข 0:00:00
โ [20.1/32] Grouping filtered ways - assigning groups โข 0:00:00
โ ผ [20.2/32] Grouping filtered ways - joining with nodes โข 0:00:01
โ [20.3/32] Grouping filtered ways - partitioning by group โข 0:00:00
[ 21/32] Saving filtered ways with linestrings 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1/1 โข 0:00:00 < 0:00:00 โข
โ [22.1/32] Grouping required ways - assigning groups โข 0:00:00
โ ผ [22.2/32] Grouping required ways - joining with nodes โข 0:00:01
โ [22.3/32] Grouping required ways - partitioning by group โข 0:00:00
[ 23/32] Saving required ways with linestrings 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1/1 โข 0:00:00 < 0:00:00 โข
โ ด [ 24/32] Saving filtered ways with geometries โข 0:00:00
โ [ 25/32] Saving valid relations parts โข 0:00:00
โ [ 26/32] Saving relations inner parts โข 0:00:00
โ [ 27/32] Saving relations outer parts โข 0:00:00
โ [ 28/32] Saving relations outer parts with holes โข 0:00:00
โ [ 29/32] Saving relations outer parts without holes โข 0:00:00
โ [ 30/32] Saving filtered relations with geometries โข 0:00:00
โ [ 31/32] Saving all features โข 0:00:00
โ [ 32/32] Saving final geoparquet file โข 0:00:00
Finished operation in 0:00:13
files/78580cf29b5ba1073366a257e1909bfeee43c9f5859e48fb3b2d592028bb58aa_nofilter_compact.parquet
>>> import quackosm as qosm
>>> qosm.convert_osm_extract_to_geodataframe("Vatican City")
tags geometry
feature_id
node/4227893563 {'addr:housenumber': '139', ... POINT (12.45966 41.9039)
node/4227893564 {'amenity': 'fast_food', 'na... POINT (12.45952 41.90391)
node/4227893565 {'name': 'Ferramenta Pieroni... POINT (12.46042 41.90385)
node/4227893566 {'amenity': 'ice_cream', 'na... POINT (12.45912 41.90394)
node/4227893568 {'amenity': 'cafe', 'name': ... POINT (12.46112 41.90381)
... ... ...
relation/2939617 {'building': 'yes', 'type': ... POLYGON ((12.45269 41.908...
relation/11839271 {'building': 'yes', 'type': ... POLYGON ((12.44939 41.897...
relation/12988851 {'access': 'private', 'ameni... POLYGON ((12.45434 41.903...
relation/13571840 {'layer': '1', 'man_made': '... POLYGON ((12.45132 41.899...
relation/3256168 {'building': 'yes', 'type': ... POLYGON ((12.46061 41.907...
[8318 rows x 2 columns]
>>> import quackosm as qosm
>>> gpq_path = qosm.convert_osm_extract_to_parquet("Paris", osm_extract_source="OSMfr")
>>> gpq_path.as_posix()
'files/osmfr_europe_france_ile_de_france_paris_nofilter_noclip_compact.parquet'
>>> import duckdb
>>> duckdb.load_extension('spatial')
>>> duckdb.read_parquet(str(gpq_path)).order("feature_id")
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ feature_id โ tags โ geometry โ
โ varchar โ map(varchar, varchar) โ geometry โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ node/10000001235 โ {information=guidepost, โฆ โ POINT (2.3423756 48.8635788) โ
โ node/10000001236 โ {barrier=bollard} โ POINT (2.3423613 48.8635746) โ
โ node/10000001237 โ {barrier=bollard} โ POINT (2.3423555 48.8635657) โ
โ node/10000001238 โ {barrier=bollard} โ POINT (2.34235 48.8635575) โ
โ node/10000001239 โ {barrier=bollard} โ POINT (2.3423438 48.8635481) โ
โ node/10000005002 โ {amenity=vending_machineโฆ โ POINT (2.3438906 48.8642058) โ
โ node/10000005003 โ {addr:city=Paris, addr:hโฆ โ POINT (2.3441257 48.8642723) โ
โ node/10000005297 โ {emergency=fire_hydrant,โฆ โ POINT (2.2943897 48.8356289) โ
โ node/10000034353 โ {name=Elisa&Marie, shop=โฆ โ POINT (2.3476407 48.8636628) โ
โ node/10000079406 โ {emergency=fire_hydrant,โฆ โ POINT (2.2951077 48.8349097) โ
โ ยท โ ยท โ ยท โ
โ ยท โ ยท โ ยท โ
โ ยท โ ยท โ ยท โ
โ node/10180452313 โ {highway=crossing} โ POINT (2.2668596 48.8351167) โ
โ node/10180457217 โ {amenity=charging_statioโฆ โ POINT (2.2996381 48.8654136) โ
โ node/10180457222 โ {advertising=poster_box,โฆ โ POINT (2.2996126 48.8651971) โ
โ node/10180457223 โ {advertising=poster_box,โฆ โ POINT (2.2990548 48.8651713) โ
โ node/10180457224 โ {advertising=poster_box,โฆ โ POINT (2.3002578 48.8651435) โ
โ node/10180457225 โ {advertising=poster_box,โฆ โ POINT (2.3001396 48.8649086) โ
โ node/10180457226 โ {advertising=column, colโฆ โ POINT (2.3002337 48.8648869) โ
โ node/10180457227 โ {advertising=poster_box,โฆ โ POINT (2.3004355 48.8648103) โ
โ node/10180457247 โ {advertising=poster_box,โฆ โ POINT (2.3006468 48.8647237) โ
โ node/10180457248 โ {advertising=poster_box,โฆ โ POINT (2.3008908 48.8643751) โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ? rows (>9999 rows, 20 shown) 3 columns โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
$ quackosm --osm-extract-query "Gibraltar"
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1.57M/1.57M [00:00<00:00, 8.66GB/s]
โ [ 1/32] Reading nodes โข 0:00:00
โ [ 2/32] Filtering nodes - intersection โข 0:00:00
โ [ 3/32] Filtering nodes - tags โข 0:00:00
โ [ 4/32] Calculating distinct filtered nodes ids โข 0:00:00
โ [ 5/32] Reading ways โข 0:00:00
โ [ 6/32] Unnesting ways โข 0:00:00
โ น [ 7/32] Filtering ways - valid refs โข 0:00:00
โ [ 8/32] Filtering ways - intersection โข 0:00:00
โ [ 9/32] Filtering ways - tags โข 0:00:00
โ [ 10/32] Calculating distinct filtered ways ids โข 0:00:00
โ [ 11/32] Reading relations โข 0:00:00
โ [ 12/32] Unnesting relations โข 0:00:00
โ ผ [ 13/32] Filtering relations - valid refs โข 0:00:00
โ [ 14/32] Filtering relations - intersection โข 0:00:00
โ [ 15/32] Filtering relations - tags โข 0:00:00
โ [ 16/32] Calculating distinct filtered relations ids โข 0:00:00
โ [ 17/32] Loading required ways - by relations โข 0:00:00
โ [ 18/32] Calculating distinct required ways ids โข 0:00:00
โ [ 19/32] Saving filtered nodes with geometries โข 0:00:00
โ [20.1/32] Grouping filtered ways - assigning groups โข 0:00:00
โ ธ [20.2/32] Grouping filtered ways - joining with nodes โข 0:00:10
โ [20.3/32] Grouping filtered ways - partitioning by group โข 0:00:00
[ 21/32] Saving filtered ways with linestrings 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1/1 โข 0:00:11 < 0:00:00 โข
โ [22.1/32] Grouping required ways - assigning groups โข 0:00:00
โ น [22.2/32] Grouping required ways - joining with nodes โข 0:00:12
โ [22.3/32] Grouping required ways - partitioning by group โข 0:00:00
[ 23/32] Saving required ways with linestrings 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1/1 โข 0:00:11 < 0:00:00 โข
โ น [ 24/32] Saving filtered ways with geometries โข 0:00:00
โ ธ [ 25/32] Saving valid relations parts โข 0:00:00
โ [ 26/32] Saving relations inner parts โข 0:00:00
โ [ 27/32] Saving relations outer parts โข 0:00:00
โ [ 28/32] Saving relations outer parts with holes โข 0:00:00
โ [ 29/32] Saving relations outer parts without holes โข 0:00:00
โ น [ 30/32] Saving filtered relations with geometries โข 0:00:00
โ น [ 31/32] Saving all features โข 0:00:00
[ 32/32] Saving final geoparquet file 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 16/16 โข 0:00:00 < 0:00:00 โข 163.96 it/s
Finished operation in 0:00:50
files/osmfr_europe_gibraltar_nofilter_noclip_compact.parquet
QuackOSM -h
)You can find full API + more examples in the docs.
QuackOSM utilizes ST_ReadOSM
function from DuckDB
's Spatial
extension to read raw data from the PBF file:
Library contains a logic to construct geometries (points, linestrings, polygons) from those raw features.
You might ask a question: How do I know that these geometries are reconstructed correctly?
To answer this question, the QuackOSM
has implemented dedicated tests that validate the results of GDAL
geometries vs QuackOSM
.
This might come as a surprise, but since OSM geometries aren't always perfectly defined (especially relations), the QuackOSM
can even fix geometries that are loaded with weird artifacts by GDAL
.
You can inspect the comparison algorithm in the test_gdal_parity
function from tests/base/test_pbf_file_reader.py
file.
Library utilizes caching system to reduce repeatable computations.
By default, the library is saving results in the files
directory created in the working directory. Result file name is generated based on the original *.osm.pbf
file name.
Original file name to be converted: example.osm.pbf
.
Default output without any filtering: example_nofilter_noclip_compact.parquet
.
The nofilter part can be replaced by the hash of OSM tags provided for filtering.
example_a9dd1c3c2e3d6a94354464e9a1a536ef44cca77eebbd882f48ca52799eb4ca91_noclip_exploded.parquet
The noclip part can be replaced by the hash of geometry used for filtering.
example_nofilter_430020b6b1ba7bef8ea919b2fb4472dab2972c70a2abae253760a56c29f449c4_compact.parquet
The compact
part can also take the form of exploded
, it represents the form of OSM tags - either kept together in a single dictionary or split into columns.
When filtering by selecting individual features IDs, an additional hash based on those IDs is appended to the file.
example_nofilter_noclip_compact_c740a1597e53ae8c5e98c5119eaa1893ddc177161afe8642addcbe54a6dc089d.parquet
When the keep_all_tags
parameter is passed while filtering by OSM tags, and additional alltags
component is added after the osm filter hash part.
example_a9dd1c3c2e3d6a94354464e9a1a536ef44cca77eebbd882f48ca52799eb4ca91_alltags_noclip_compact.parquet
General schema of multiple segments that are concatenated together:
pbf_file_name
_(osm_filter_tags_hash_part
/nofilter
)(_alltags
)_(clipping_geometry_hash_part
/noclip
)_(compact
/exploded
)(_filter_osm_ids_hash_part
).parquet
If the WKT mode is turned on, then the result file will be saved with a
_wkt
suffix.
DuckDB queries requiring JOIN
, GROUP
and ORDER BY
operations are very memory intensive. Because of that, some steps are divided into chunks (groups) with a set number of rows per chunk.
QuackOSM has been roughly tuned to different workloads. The rows_per_group
variable is set based on an available memory in the system:
Memory | Rows per group |
---|---|
< 8 GB | 100 000 |
8 - 16 GB | 500 000 |
16 - 24 GB | 1 000 000 |
> 24 GB | 5 000 000 |
WSL usage: sometimes code can break since DuckDB is trying to use all available memory, that can be occupied by Windows.
The algorithm depends on saving intermediate .parquet
files between queries.
As a rule of thumb, when parsing a full file without filtering, you should have at least 10x more free space on disk than the base file size (100MB pbf file -> 1GB free space to parse it).
Below you can see the chart of resources usage during operation. Generated on a Github Actions Ubuntu virtual machine with 4 threads and 16 GB of memory.
PBF file size: 525 KB
PBF file size: 100 MB
PBF file size: 1.7 GB
The library is distributed under Apache-2.0 License.
The free OpenStreetMap data, which is used for the development of QuackOSM, is licensed under the Open Data Commons Open Database License (ODbL) by the OpenStreetMap Foundation (OSMF).