nlextract / NLExtract

Convert (ETL) and visualize free Dutch geo-datasets.
https://nlextract.nl
GNU General Public License v3.0
149 stars 84 forks source link

BAGv2 geen `levering_xml` record in `nlx_bag_info` tabel #332

Closed sebastic closed 2 years ago

sebastic commented 2 years ago

Het Leveringsdocument-BAG-Extract.xml bestand wordt verwerkt, maar er is geen levering_xml record voor in de nlx_bag_info tabel zoals voor BAG v1:

bag=# SELECT gid, tijdstempel, sleutel, SUBSTRING(waarde FROM 0 FOR 80) AS waarde FROM nlx_bag_info;
 gid |        tijdstempel         |         sleutel          |                                     waarde                                      
-----+----------------------------+--------------------------+---------------------------------------------------------------------------------
   1 | 2021-10-15 04:34:49.369724 | schema_versie            | 1.4.0
   2 | 2021-10-15 04:34:49.369724 | software_versie          | 1.5.3
   3 | 2021-10-15 04:34:49.369724 | schema_creatie           | 15-Oct-2021 04:34:49
   4 | 2021-10-15 14:09:53.518844 | gem_woonplaats_rel_datum | 20211008
   5 | 2021-10-15 14:09:56.937841 | levering_xml             | <v20:BAG-Extract-Levering xmlns:v20="http://www.kadaster.nl/schemas/bag-verstre
   6 | 2021-10-15 14:09:56.993255 | extract_datum            | 20211008
(6 rows)
bagv2=# SELECT * FROM nlx_bag_info;
 gid |        tijdstempel         |     sleutel     |        waarde        
-----+----------------------------+-----------------+----------------------
   1 | 2021-11-19 13:52:58.155008 | schema_versie   | 2.1.0
   2 | 2021-11-19 13:52:58.155008 | software_versie | 1.5.3dev
   3 | 2021-11-19 13:52:58.155008 | schema_creatie  | 19-Nov-2021 13:52:58
   4 | 2021-11-19 13:52:58.155008 | bag_xsd_versie  | 20200601
   5 | 2021-11-19 13:52:58.155008 | start_base_etl  | 19-Nov-2021 13:52:58
   6 | 2021-11-19 13:54:00.031052 | extract_datum   | 2021-11-08
(6 rows)

Stetl output van een eerder run:

2021-11-19 11:27:28,458 chain INFO Assembling Chain: input_bag_leveringsdoc_file|convert_vsi_to_etree_doc|transformer_xslt_leveringsdoc|output_ogr_postgis...
2021-11-19 11:27:28,458 input INFO cfg = {'class': 'stetl.inputs.fileinput.VsiZipFileInput', 'file_path': 'test/data/lv/BAGNLDL-15092020-small.zip', 'name_filter': 'Leveringsdocument-BAG-Extract.xml'}
2021-11-19 11:27:28,533 fileinput INFO file_list=['test/data/lv/BAGNLDL-15092020-small.zip']
2021-11-19 11:27:28,751 output INFO cfg = {'class': 'stetl.outputs.ogroutput.Ogr2OgrOutput', 'temp_file': 'temp/bagger.xml', 'ogr2ogr_cmd': 'ogr2ogr\n-append\n-lco LAUNDER=YES -lco PRECISION=NO -lco FID=gid -lco SPATIAL_INDEX=NONE\n-f PostgreSQL\n"PG:dbname=bagv2 host=localhost port=5432 user=<hidden> password=<hidden> active_schema=bag"\n-oo GFS_TEMPLATE=gfs/imbag-2.1.0.gfs\n-oo WRITE_GFS=NO\n-a_srs EPSG:28992\ntemp/bagger.xml'}
2021-11-19 11:27:28,752 chain INFO Running Chain: input_bag_leveringsdoc_file|convert_vsi_to_etree_doc|transformer_xslt_leveringsdoc|output_ogr_postgis
2021-11-19 11:27:28,849 fileinput DEBUG MATCH: ['/vsizip/{test/data/lv/BAGNLDL-15092020-small.zip}', 'Leveringsdocument-BAG-Extract.xml']
2021-11-19 11:27:28,849 fileinput DEBUG ADD ENTRY: /vsizip/{test/data/lv/BAGNLDL-15092020-small.zip}/Leveringsdocument-BAG-Extract.xml
2021-11-19 11:27:28,858 fileinput INFO zip file read : test/data/lv/BAGNLDL-15092020-small.zip filecount=1
2021-11-19 11:27:28,858 fileinput INFO Pop file record: /vsizip/{test/data/lv/BAGNLDL-15092020-small.zip}/Leveringsdocument-BAG-Extract.xml
2021-11-19 11:27:29,079 xsltfilter INFO XSLT Transform OK
2021-11-19 11:27:29,079 ogroutput INFO writing to file temp/bagger.xml
2021-11-19 11:27:29,087 ogroutput INFO written to temp/bagger.xml
2021-11-19 11:27:29,087 ogroutput INFO executing cmd=ogr2ogr -append -lco LAUNDER=YES -lco PRECISION=NO -lco FID=gid -lco SPATIAL_INDEX=NONE -f PostgreSQL "PG:dbname=bagv2 host=localhost port=5432 user=<hidden> password=<hidden> active_schema=bag" -oo GFS_TEMPLATE=gfs/imbag-2.1.0.gfs -oo WRITE_GFS=NO -a_srs EPSG:28992 temp/bagger.xml
Warning 1: Layer creation options ignored since an existing layer is
         being appended to.
2021-11-19 11:27:29,921 ogroutput INFO execute done
2021-11-19 11:27:29,922 fileinput INFO EOF file list, all files done
2021-11-19 11:27:29,922 component INFO VsiZipFileInput invokes=2 time(total, min, max, avg) = 0.107 0.107 0.107 0.053
2021-11-19 11:27:29,922 component INFO FormatConverter invokes=2 time(total, min, max, avg) = 0.166 0.166 0.166 0.083
2021-11-19 11:27:29,922 component INFO XsltFilter invokes=2 time(total, min, max, avg) = 0.055 0.055 0.055 0.027
2021-11-19 11:27:29,922 component INFO Ogr2OgrOutput invokes=2 time(total, min, max, avg) = 0.842 0.842 0.842 0.421
2021-11-19 11:27:29,922 chain INFO DONE - 2 rounds - chain=input_bag_leveringsdoc_file|convert_vsi_to_etree_doc|transformer_xslt_leveringsdoc|output_ogr_postgis
sebastic commented 2 years ago

Met deze changes kan het levering_xml record worden toegevoegd:

Stetl

--- /dev/null
+++ b/stetl/filters/bagfilter.py
@@ -0,0 +1,58 @@
+# BAG related filters
+
+from stetl.component import Config
+from stetl.util import Util
+from stetl.filter import Filter
+from stetl.packet import FORMAT
+
+log = Util.get_log("bagfilter")
+
+
+class LeveringFilter(Filter):
+    """
+    Convert Leveringsdocument-BAG-Extract.xml content to record for
+    insertion into nlx_bag_info table.
+    """
+
+    @Config(ptype=str, default='sleutel', required=False)
+    def key_column(self):
+        """
+        Column name for key
+        """
+        pass
+
+    @Config(ptype=str, default='levering_xml', required=False)
+    def key_value(self):
+        """
+        Column value for key
+        """
+        pass
+
+    @Config(ptype=str, default='waarde', required=False)
+    def value_column(self):
+        """
+        Column name for value
+        """
+        pass
+
+    # Constructor
+    def __init__(self, configdict, section, consumes=FORMAT.string, produces=FORMAT.record_array):
+        Filter.__init__(self, configdict, section, consumes, produces)
+
+    def invoke(self, packet):
+        if packet.data is None or packet.is_end_of_stream():
+            return packet
+
+        with open(packet.data, 'rt') as f:
+            data = f.read()
+
+        record = {
+            self.key_column: self.key_value,
+            self.value_column: data,
+        }
+
+        # record_array is used to avoid ValueError:
+        # https://github.com/geopython/stetl/issues/125
+        packet.data = [record]
+
+        return packet

NLExtact

--- a/bagv2/etl/conf/etl-imbag-2.1.0.cfg
+++ b/bagv2/etl/conf/etl-imbag-2.1.0.cfg
@@ -28,6 +28,7 @@
 # [GDAL_3462]
 chains = input_sql_pre|output_postgres_sql,
          input_bag_leveringsdoc_file|convert_vsi_to_etree_doc|transformer_xslt_leveringsdoc|output_ogr_postgis,
+         input_bag_leveringsdoc_zipfile|extract_bag_leveringsdoc_file|convert_leveringsdoc_to_record|output_postgres_levering,
          input_string_file|command_executor,
          input_cbs_gemeenten_csv|output_postgres_gemeenten,
          input_bag_wpl_gem_koppeling_file|convert_vsi_to_etree_doc|transformer_xslt_gem_wpl|output_ogr_postgis,
@@ -87,6 +88,11 @@ class = stetl.inputs.fileinput.VsiZipFileInput
 file_path = {bag_input_file}
 name_filter = Leveringsdocument-BAG-Extract.xml

+[input_bag_leveringsdoc_zipfile]
+class = stetl.inputs.fileinput.ZipFileInput
+file_path = {bag_input_file}
+name_filter = Leveringsdocument-BAG-Extract.xml
+

 #
 # FILTERS
@@ -98,6 +104,10 @@ class = stetl.filters.fileextractor.VsiFileExtractor
 file_path = temp/bagger.{bag_file_ext}
 delete_file = True

+[extract_bag_leveringsdoc_file]
+class = stetl.filters.fileextractor.ZipFileExtractor
+file_path = temp/Leveringsdocument-BAG-Extract.xml
+
 # Filter to expand a ZIP archive to a configured target dir
 [expand_zip_archive]
 class = stetl.filters.archiveexpander.ZipArchiveExpander
@@ -111,6 +121,10 @@ class = stetl.filters.formatconverter.FormatConverter
 input_format = gdal_vsi_path
 output_format = etree_doc

+# convert leverings document content to record
+[convert_leveringsdoc_to_record]
+class = stetl.filters.bagfilter.LeveringFilter
+
 # XSLT transformer GEM-WPL-RELATIE to OGR Simple Feature GML doctree
 [transformer_xslt_gem_wpl]
 class = stetl.filters.xsltfilter.XsltFilter
@@ -148,6 +162,19 @@ table = provincie_gemeente
 key = gemeentecode
 replace = true

+# Insert levering XML into nlx_bag_info table
+[output_postgres_levering]
+class = stetl.outputs.dboutput.PostgresInsertOutput
+database = {pg_db}
+host = {pg_host}
+port = {pg_port}
+user = {pg_user}
+password = {password}
+schema = {schema}
+table = nlx_bag_info
+key = sleutel
+replace = true
+
 # The ogr2ogr command-line for generic GML Features.
 [output_ogr_postgis]
 class = stetl.outputs.ogroutput.Ogr2OgrOutput

Gezien de weerstand tegen de changes in #319 zal ik geen pull requests hiervoor indienen. Indien ik niet de enige blijk te zijn die hiervan gebruik wil maken, kan dat als nog.

justb4 commented 2 years ago

Volgens mij opgelost via PR #334. Dus sluit ik deze.

sebastic commented 2 years ago

Dit issue is niet helemaal opgelost met #334.

Lokaal gebruik ik deze patch voor Stetl:

From 976cc48f3c9aff6eaf14d5a37fc958b33112cfdf Mon Sep 17 00:00:00 2001
From: Bas Couwenberg <sebastic@xs4sall.nl>
Date: Mon, 29 Nov 2021 18:14:25 +0100
Subject: Add stetl.filters.bagfilter.LeveringFilter class.

Converts Leveringsdocument-BAG-Extract.xml content to record for
insertion into nlx_bag_info table.
---
 stetl/filters/bagfilter.py | 58 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100644 stetl/filters/bagfilter.py

diff --git a/stetl/filters/bagfilter.py b/stetl/filters/bagfilter.py
new file mode 100644
index 00000000..ece19ef1
--- /dev/null
+++ b/stetl/filters/bagfilter.py
--- /dev/null
+++ b/stetl/filters/bagfilter.py
@@ -0,0 +1,58 @@
+# BAG related filters
+
+from stetl.component import Config
+from stetl.util import Util
+from stetl.filter import Filter
+from stetl.packet import FORMAT
+
+log = Util.get_log("bagfilter")
+
+
+class LeveringFilter(Filter):
+    """
+    Convert Leveringsdocument-BAG-Extract.xml content to record for
+    insertion into nlx_bag_info table.
+    """
+
+    @Config(ptype=str, default='sleutel', required=False)
+    def key_column(self):
+        """
+        Column name for key
+        """
+        pass
+
+    @Config(ptype=str, default='levering_xml', required=False)
+    def key_value(self):
+        """
+        Column value for key
+        """
+        pass
+
+    @Config(ptype=str, default='waarde', required=False)
+    def value_column(self):
+        """
+        Column name for value
+        """
+        pass
+
+    # Constructor
+    def __init__(self, configdict, section, consumes=FORMAT.string, produces=FORMAT.record_array):
+        Filter.__init__(self, configdict, section, consumes, produces)
+
+    def invoke(self, packet):
+        if packet.data is None or packet.is_end_of_stream():
+            return packet
+
+        with open(packet.data, 'rt') as f:
+            data = f.read()
+
+        record = {
+            self.key_column: self.key_value,
+            self.value_column: data,
+        }
+
+        # record_array is used to avoid ValueError:
+        # https://github.com/geopython/stetl/issues/125
+        packet.data = [record]
+
+        return packet
--
2.30.2

En deze change voor de ETL configuratie:

diff --git a/bagv2/etl/conf/etl-imbag-2.1.0.cfg b/bagv2/etl/conf/etl-imbag-2.1.0.cfg
index 8cbbd107..03f2053d 100644
--- a/bagv2/etl/conf/etl-imbag-2.1.0.cfg
+++ b/bagv2/etl/conf/etl-imbag-2.1.0.cfg
@@ -28,6 +28,7 @@
 # [GDAL_3462]
 chains = input_sql_pre|output_postgres_sql,
          input_bag_leveringsdoc_file|convert_vsi_to_etree_doc|transformer_xslt_leveringsdoc|output_ogr_postgis,
+         input_bag_leveringsdoc_zipfile|extract_bag_leveringsdoc_file|convert_leveringsdoc_to_record|output_postgres_levering,
          input_string_file|command_executor,
          input_cbs_gemeenten_csv|output_postgres_gemeenten,
          input_bag_wpl_gem_koppeling_file|convert_vsi_to_etree_doc|transformer_xslt_gem_wpl|output_ogr_postgis,
@@ -87,6 +88,11 @@ class = stetl.inputs.fileinput.VsiZipFileInput
 file_path = {bag_input_file}
 name_filter = Leveringsdocument-BAG-Extract.xml

+[input_bag_leveringsdoc_zipfile]
+class = stetl.inputs.fileinput.ZipFileInput
+file_path = {bag_input_file}
+name_filter = Leveringsdocument-BAG-Extract.xml
+

 #
 # FILTERS
@@ -98,6 +104,10 @@ class = stetl.filters.fileextractor.VsiFileExtractor
 file_path = temp/bagger.{bag_file_ext}
 delete_file = True

+[extract_bag_leveringsdoc_file]
+class = stetl.filters.fileextractor.ZipFileExtractor
+file_path = temp/Leveringsdocument-BAG-Extract.xml
+
 # Filter to expand a ZIP archive to a configured target dir
 [expand_zip_archive]
 class = stetl.filters.archiveexpander.ZipArchiveExpander
@@ -111,6 +121,10 @@ class = stetl.filters.formatconverter.FormatConverter
 input_format = gdal_vsi_path
 output_format = etree_doc

+# convert leverings document content to record
+[convert_leveringsdoc_to_record]
+class = stetl.filters.bagfilter.LeveringFilter
+
 # XSLT transformer GEM-WPL-RELATIE to OGR Simple Feature GML doctree
 [transformer_xslt_gem_wpl]
 class = stetl.filters.xsltfilter.XsltFilter
@@ -146,6 +160,19 @@ password = {password}
 schema = {schema}
 table = provincie_gemeente

+# Insert levering XML into nlx_bag_info table
+[output_postgres_levering]
+class = stetl.outputs.dboutput.PostgresInsertOutput
+database = {pg_db}
+host = {pg_host}
+port = {pg_port}
+user = {pg_user}
+password = {password}
+schema = {schema}
+table = nlx_bag_info
+key = sleutel
+replace = true
+
 # The ogr2ogr command-line for generic GML Features.
 [output_ogr_postgis]
 class = stetl.outputs.ogroutput.Ogr2OgrOutput