nlextract / NLExtract

Convert (ETL) and visualize free Dutch geo-datasets.
https://nlextract.nl
GNU General Public License v3.0
153 stars 83 forks source link

BGT Import crashed op panden #324

Closed mwjhartogs closed 2 years ago

mwjhartogs commented 3 years ago

Beste,

Ik lees geheel Nederland in (/bgt-citygml-nl-nopbp.zip' 19 GB groot).

Dan krijg ik ongeveer na 8 uur processing de volgende error op panden. Heeft dit met een timeout te maken. Dit omdat panden veel langer in behandeling is dan de andere features.

2021-07-26 14:03:25,807 execoutput INFO execute done 2021-07-26 14:03:26,437 fileinput INFO Pop file record: {'file_path': '/input/bgt/bgt-citygml-nl-nopbp.zip', 'name': 'bgt_pand.gml'} 2021-07-26 14:06:36,459 subfeaturehandler INFO invoke() for imgeo:positie 2021-07-26 14:06:36,826 subfeaturehandler INFO skipping GML file for subfeat=imgeo:positie 2021-07-26 14:06:36,826 subfeaturehandler INFO invoke() for imgeo:nummeraanduidingreeks[imgeo:Nummeraanduidingreeks] 2021-07-26 14:06:36,837 subfeaturehandler INFO checkGmlFile: found parentTag {http://www.opengis.net/citygml/building/2.0}BuildingPart ./etl.sh: line 42: 515 Killed python $STETL_HOME/stetl/main.py -c conf/etl-imgeo-v2.1.1.cfg -a $options_file

Hopelijk een oplossing, omdat de foutmeldingen niet zo heel duidelijk is..

M

justb4 commented 3 years ago

Meestal is dit een geheugenprobleem. subfeaturehandler is een heftig proces, omdat helaas binnen BGT oa voor Pand binnen elke GML element ook weer Nummeraanduiding element als 'subfeature' plaatst. Melding is vaker geweest: https://gitter.im/nlextract/NLExtract?at=5f563dd99bad075eac03d85b

Het is wel handig om meer context te geven: OS, wel/geen Docker, en vooral RAM geheugen etc.

mwjhartogs commented 3 years ago

Meestal is dit een geheugenprobleem. subfeaturehandler is een heftig proces, omdat helaas binnen BGT oa voor Pand binnen elke GML element ook weer Nummeraanduiding element als 'subfeature' plaatst. Melding is vaker geweest: https://gitter.im/nlextract/NLExtract?at=5f563dd99bad075eac03d85b

Het is wel handig om meer context te geven: OS, wel/geen Docker, en vooral RAM geheugen etc.

Beste Just,

Het id idd een probleem met het geheugen. Ik had het proces in een docker container gedraaid. Server had 16GB aan geheugen. Geprobeerd eerst een quick fix uit te voeren door het geheugen op de VPS op te hogen naar 24GB, maar dit was ook niet genoeg. De huidige BGT importer probeert nu de kaartbladen te downloaden, maar die bestaan helaas niet meer in de nieuwe download service.

Mocht je interesse hebben, heb even snel een nieuwe downloader geschreven in python, evt later een pull request:

import requests
import json
import time
import wget
import os

pdokapiurl = 'https://api.pdok.nl'
pdokbgturl = pdokapiurl + '/lv/bgt/download/v1_0/full/custom'
downloadlocation = '/home/NLExtract/bgt/etl/download/'

grids = [
"POLYGON ((185425.156 621876.3,210425.156 621876.3,210425.156 596876.3,185425.156 596876.3,185425.156 621876.3))",
"POLYGON ((260425.156 596876.3,285425.156 596876.3,285425.156 571876.3,260425.156 571876.3,260425.156 596876.3))",
"POLYGON ((260425.156 571876.3,285425.156 571876.3,285425.156 546876.3,260425.156 546876.3,260425.156 571876.3))",
"POLYGON ((260425.156 546876.3,285425.156 546876.3,285425.156 521876.3,260425.156 521876.3,260425.156 546876.3))",
"POLYGON ((260425.156 521876.3,285425.156 521876.3,285425.156 496876.3,260425.156 496876.3,260425.156 521876.3))",
"POLYGON ((160425.156 371876.3,185425.156 371876.3,185425.156 346876.3,160425.156 346876.3,160425.156 371876.3))",
"POLYGON ((160425.156 346876.3,185425.156 346876.3,185425.156 321876.3,160425.156 321876.3,160425.156 346876.3))",
"POLYGON ((160425.156 321876.3,185425.156 321876.3,185425.156 296876.3,160425.156 296876.3,160425.156 321876.3))",
"POLYGON ((35425.156 371876.3,60425.156 371876.3,60425.156 346876.3,35425.156 346876.3,35425.156 371876.3))",
"POLYGON ((260425.156 621876.3,285425.156 621876.3,285425.156 596876.3,260425.156 596876.3,260425.156 621876.3))",
"POLYGON ((35425.156 446876.3,60425.156 446876.3,60425.156 421876.3,35425.156 421876.3,35425.156 446876.3))",
"POLYGON ((35425.156 421876.3,60425.156 421876.3,60425.156 396876.3,35425.156 396876.3,35425.156 421876.3))",
"POLYGON ((85425.156 521876.3,110425.156 521876.3,110425.156 496876.3,85425.156 496876.3,85425.156 521876.3))",
"POLYGON ((185425.156 321876.3,210425.156 321876.3,210425.156 296876.3,185425.156 296876.3,185425.156 321876.3))",
"POLYGON ((85425.156 496876.3,110425.156 496876.3,110425.156 471876.3,85425.156 471876.3,85425.156 496876.3))",
"POLYGON ((210425.156 621876.3,235425.156 621876.3,235425.156 596876.3,210425.156 596876.3,210425.156 621876.3))",
"POLYGON ((85425.156 471876.3,110425.156 471876.3,110425.156 446876.3,85425.156 446876.3,85425.156 471876.3))",
"POLYGON ((185425.156 421876.3,210425.156 421876.3,210425.156 396876.3,185425.156 396876.3,185425.156 421876.3))",
"POLYGON ((185425.156 396876.3,210425.156 396876.3,210425.156 371876.3,185425.156 371876.3,185425.156 396876.3))",
"POLYGON ((85425.156 571876.3,110425.156 571876.3,110425.156 546876.3,85425.156 546876.3,85425.156 571876.3))",
"POLYGON ((185425.156 371876.3,210425.156 371876.3,210425.156 346876.3,185425.156 346876.3,185425.156 371876.3))",
"POLYGON ((85425.156 546876.3,110425.156 546876.3,110425.156 521876.3,85425.156 521876.3,85425.156 546876.3))",
"POLYGON ((185425.156 346876.3,210425.156 346876.3,210425.156 321876.3,185425.156 321876.3,185425.156 346876.3))",
"POLYGON ((60425.156 396876.3,85425.156 396876.3,85425.156 371876.3,60425.156 371876.3,60425.156 396876.3))",
"POLYGON ((60425.156 371876.3,85425.156 371876.3,85425.156 346876.3,60425.156 346876.3,60425.156 371876.3))",
"POLYGON ((185425.156 446876.3,210425.156 446876.3,210425.156 421876.3,185425.156 421876.3,185425.156 446876.3))",
"POLYGON ((60425.156 471876.3,85425.156 471876.3,85425.156 446876.3,60425.156 446876.3,60425.156 471876.3))",
"POLYGON ((235425.156 471876.3,260425.156 471876.3,260425.156 446876.3,235425.156 446876.3,235425.156 471876.3))",
"POLYGON ((60425.156 446876.3,85425.156 446876.3,85425.156 421876.3,60425.156 421876.3,60425.156 446876.3))",
"POLYGON ((235425.156 446876.3,260425.156 446876.3,260425.156 421876.3,235425.156 421876.3,235425.156 446876.3))",
"POLYGON ((110425.156 396876.3,135425.156 396876.3,135425.156 371876.3,110425.156 371876.3,110425.156 396876.3))",
"POLYGON ((235425.156 521876.3,260425.156 521876.3,260425.156 496876.3,235425.156 496876.3,235425.156 521876.3))",
"POLYGON ((235425.156 496876.3,260425.156 496876.3,260425.156 471876.3,235425.156 471876.3,235425.156 496876.3))",
"POLYGON ((235425.156 621876.3,260425.156 621876.3,260425.156 596876.3,235425.156 596876.3,235425.156 621876.3))",
"POLYGON ((210425.156 446876.3,235425.156 446876.3,235425.156 421876.3,210425.156 421876.3,210425.156 446876.3))",
"POLYGON ((110425.156 596876.3,135425.156 596876.3,135425.156 571876.3,110425.156 571876.3,110425.156 596876.3))",
"POLYGON ((210425.156 396876.3,235425.156 396876.3,235425.156 371876.3,210425.156 371876.3,210425.156 396876.3))",
"POLYGON ((110425.156 571876.3,135425.156 571876.3,135425.156 546876.3,110425.156 546876.3,110425.156 571876.3))",
"POLYGON ((210425.156 371876.3,235425.156 371876.3,235425.156 346876.3,210425.156 346876.3,210425.156 371876.3))",
"POLYGON ((135425.156 371876.3,160425.156 371876.3,160425.156 346876.3,135425.156 346876.3,135425.156 371876.3))",
"POLYGON ((85425.156 396876.3,110425.156 396876.3,110425.156 371876.3,85425.156 371876.3,85425.156 396876.3))",
"POLYGON ((160425.156 621876.3,185425.156 621876.3,185425.156 596876.3,160425.156 596876.3,160425.156 621876.3))",
"POLYGON ((135425.156 396876.3,160425.156 396876.3,160425.156 371876.3,135425.156 371876.3,135425.156 396876.3))",
"POLYGON ((10425.156 421876.3,35425.156 421876.3,35425.156 396876.3,10425.156 396876.3,10425.156 421876.3))",
"POLYGON ((10425.156 396876.3,35425.156 396876.3,35425.156 371876.3,10425.156 371876.3,10425.156 396876.3))",
"POLYGON ((10425.156 371876.3,35425.156 371876.3,35425.156 346876.3,10425.156 346876.3,10425.156 371876.3))",
"POLYGON ((260425.156 496876.3,285425.156 496876.3,285425.156 471876.3,260425.156 471876.3,260425.156 496876.3))",
"POLYGON ((135425.156 621876.3,160425.156 621876.3,160425.156 596876.3,135425.156 596876.3,135425.156 621876.3))",
"POLYGON ((260425.156 471876.3,285425.156 471876.3,285425.156 446876.3,260425.156 446876.3,260425.156 471876.3))",
"POLYGON ((135425.156 596876.3,160425.156 596876.3,160425.156 571876.3,135425.156 571876.3,135425.156 596876.3))"
]

for grid in grids:
    body = '{"format":"citygml","featuretypes":["bak","begroeidterreindeel","bord","buurt","functioneelgebied","gebouwinstallatie","installatie","kast","kunstwerkdeel","mast","onbegroeidterreindeel","ondersteunendwaterdeel","ondersteunendwegdeel","ongeclassificeerdobject","openbareruimte","openbareruimtelabel","overbruggingsdeel","overigbouwwerk","overigescheiding","paal","pand","put","scheiding","sensor","spoor","stadsdeel","straatmeubilair","tunneldeel","vegetatieobject","waterdeel","waterinrichtingselement","waterschap","wegdeel","weginrichtingselement","wijk"],"geofilter":"'+grid+'"}'
    response = requests.post(pdokbgturl, data=body,headers={"Content-Type": "application/json"})
    print("Status code: ", response.status_code)

    jsoncontent = response.json() 
    print(jsoncontent["downloadRequestId"])
    guid=jsoncontent["downloadRequestId"]
    statusurl=  pdokbgturl + '/' + guid + '/status'
    content = True
    while content: 
        statusreq = requests.get(url = statusurl)
        statusdata = statusreq.json()
        status = statusdata["status"]
        link = statusdata["_links"]
        print(status)   
        if (status!='RUNNING' and  status!='PENDING'): 
            if (status=='COMPLETED'):
                href = link["download"]
                downloadurl = pdokapiurl + href["href"]
                print(downloadurl)
                wget.download(downloadurl, downloadlocation)
                old_file_name = downloadlocation + 'extract.zip'
                new_file_name = downloadlocation + guid + '.zip'
                os.rename(old_file_name, new_file_name)
            content = False
        time.sleep(20)
print('Download bgt complete')

Bedankt voor je hulp

Mvg,

Miguel Hartogs Merkator & GeoAI

justb4 commented 3 years ago

@mwjhartogs bedankt voor de inzichten en code. Ik draai zelf "BGT" landelijke download, op 32GB RAM op een VPS maar check eerst hoeveel geheugen Docker krijgt. Zie bijv dit issue. Weliswaar voor BAG, maar ik heb gezien dat hoewel het OS 32GB RAM had, de Docker Engine/Desktop maar 4GB kreeg (bijv met Docker Desktop op Mac OSX). Die moet je dan verhogen. Maar goed je zit op een VPS. De BGT groeit als kool, hoewel dat met de Panden wel mee zal vallen. 32 GB moet in ieder geval genoeg zijn. Voor "gewone" GML files maakt de grootte niet uit: de onderliggende ETL engine Stetl werkt "streaming": de GML wordt niet eerst in geheugen ingelezen/geparsed. Bij mijn weten ook voor subfeatures in BGT Pand. Er zijn ook nog settings in "options" voor "max_features".

justb4 commented 2 years ago

m.i. opgelost, sluiten, heropen indien nog steeds probleem.