talend-spatial / workspace-metadata-crawler

Automatic geospatial data inventory with Talend Spatial
10 stars 6 forks source link

scan_vector component fails on postgresql table with no rows in it #44

Closed archaeogeek closed 4 years ago

archaeogeek commented 4 years ago

If scan_vector encounters a table with no rows in it, then it exits with the following error:

2020-04-15 10:08:35|aUVqak|YSCso8|YSCso8|METADATA_CRAWLER_V4|scan_vector|prod|4|tWarn|tWarn_2|## Analyzing layer:  schema.table|0                                                                 
Exception in component sProj_1_BO (scan_vector)                                                                                                                                                                           
java.lang.NullPointerException                                                                                                                                                                                                    at metadata_crawler_v4.scan_vector_0_1.scan_vector.sOGRInfoInput_2Process(scan_vector.java:18431)                                                                                                                 
        at metadata_crawler_v4.scan_vector_0_1.scan_vector.tSetGlobalVar_1Process(scan_vector.java:3401)                                                                                                                  
        at metadata_crawler_v4.scan_vector_0_1.scan_vector.sOGRInfoInput_1Process(scan_vector.java:3011)                                                                                                                  
        at metadata_crawler_v4.scan_vector_0_1.scan_vector.tWarn_1Process(scan_vector.java:1934)                                                                                                                          
        at metadata_crawler_v4.scan_vector_0_1.scan_vector.runJobInTOS(scan_vector.java:33534)                                                                                                                            
        at metadata_crawler_v4.scan_vector_0_1.scan_vector.runJob(scan_vector.java:32647)                                                                                                                                 
        at metadata_crawler_v4.run_0_1.run.tForeach_1Process(run.java:4047)                                                                                                                                               
        at metadata_crawler_v4.run_0_1.run.tLoop_1Process(run.java:2539)                                                                                                                                                          at metadata_crawler_v4.run_0_1.run.tRunJob_8Process(run.java:2290)                                                                                                                                                
        at metadata_crawler_v4.run_0_1.run.tPrejob_1Process(run.java:1686)                                                                                                                                                
        at metadata_crawler_v4.run_0_1.run.runJobInTOS(run.java:9680)                                                                                                                                                     
        at metadata_crawler_v4.run_0_1.run.main(run.java:8833)                                                                                                                                                            
2020-04-15 10:08:36|aUVqak|YSCso8|YSCso8|METADATA_CRAWLER_V4|scan_vector|prod|6|Java Exception|sProj_1_BO|java.lang.NullPointerException:null|1

It fails to process any further tables and moves on to publish_metadata. I have managed to replicate this in the following way:

1) Create a clean postgresql database with postgis enabled (I am using PostgreSQL 9.5 and PostGIS 2.5) 2) Create two spatial tables (I'm using dbmanager in qgis 2.18) 3) Add some features to the second table (the one that appears second in public.geometry_columns) 4) Run metadata crawler (run 0.1 job) and get the nullpointer exception, no metadata created 5) Add some features to the first table (the one appearing first in public.geometry_columns) 6) Run metadata crawler (run 0.1 job) and get no nullpointer exception- metadata is created for both tables

I have experimented further with adding additional schemas, and re-ordering the tables so that the table with no features appears lower down the list in geometry_columns and seem to be able to replicate the problem consistently.

Is there any way to get scan_vector to skip tables with no rows (and hence no spatial extent, I assume) and continue to the next table without triggering a fatal error?

archaeogeek commented 4 years ago

This problem also occurs when there's an entry in the geometry_columns table for a table that no longer exists (which can only happen when the version of PostGIS is quite old).

fxprunayre commented 4 years ago

@archaeogeek you can test it on your version. You will have to copy this file in your Talend installationplugins/org.talend.sdi.designer.components/components/sGeoBasicOperation/sGeoBasicOperation_main.javajet and then when the focus is in your job workspace, you can use CTRL+SHIFT+F3 to reload components (or if it does not work, restart talend) to reload the change.

archaeogeek commented 4 years ago

@fxprunayre that seems to work just fine thanks! Happy to close