Open lfcnassif opened 7 months ago
Hi nassif, could you share the data that trigger this errors?
Il giorno mar 5 dic 2023 alle ore 16:17 Luis Filipe Nassif < @.***> ha scritto:
Processing a 150GB UFDR with master, 50950 exceptions like below were printed in the processing log:
java.lang.IllegalArgumentException: Points of LinearRing do not form a closed linestring at com.vividsolutions.jts.geom.LinearRing.validateConstruction(LinearRing.java:111) at com.vividsolutions.jts.geom.LinearRing.
(LinearRing.java:106) at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:355) at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:342) at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:236) at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:256) at iped.geo.parsers.kmlstore.KMLParser.parsePlacemark(KMLParser.java:157) at iped.geo.parsers.kmlstore.KMLParser.parse(KMLParser.java:64) at iped.geo.parsers.kmlstore.KMLFeatureListFactory.parseFeatureList(KMLFeatureListFactory.java:12) at iped.geo.parsers.GeofileParser.parse(GeofileParser.java:76) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) at iped.parsers.standard.StandardParser.parse(StandardParser.java:245) at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Not sure if it is a bug or just a corrupted geometry object, at least I think above logging should be less verbose.
— Reply to this email directly, view it on GitHub https://github.com/sepinf-inc/IPED/issues/2013, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG247S64DMT5GIUGH6L36SDYH56MVAVCNFSM6AAAAABAIIUYJWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDOMBVGQ2TGMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sure, I'll try to find a triggering file tomorrow.
Hi @patrickdalla, I found the case triggering this, just sent the KML samples to you by Teams.
Many of the exceptions are of type "Points of LinearRing do not form a closed linestring". This, according to https://gis.stackexchange.com/questions/93946/getting-points-of-linearring-do-not-form-a-closed-linestring, is a source syntax error. Some GIS, like postgis extension to postgres, offer methods to solve some of this inconsistencies (https://postgis.net/docs/ST_MakeValid.html).
So, I suggest 2 options (or both): 1) We can try similar recover method. In the case, o unclosed linestring, we can repeat the first coord at the end of the linestring and "close it". 2) We can group all these exceptions and inform about them in a more resumed form.
I think the second is necessary. What about the first? Should IPED try to recover it? @lfcnassif
I saw that in some files were defined "linearring" 's with 2 coords only, what leads to another exception, that a linearring needs at least 4 coords (even if we add the first coord as the last). So another solution would to create "linestring" objects from these invalid linearring entries.
Does it seem a good method to bypass these invalid entries?
Maybe the 1 option (repeat the first coord as the last) is not good. Look at a sample. I think the best option is always change the type to "linearstring" when a non closed "linearring" is defined. The above object has the description "rio corrego", so it seems to map a river. Closing it does not represent the river. So the best option is to represent it as "linearstring" although defined as "linearring".
Some other syntax/semantics errors to bypass or correct: 1) placemarks coordinates without content 1) placemarks with no content 2) references to gx tags without xml namespace declaration 3) xml UTF-8 encoding declared but windows-1252 used 4) Document tag inside other document tag
I think the second is necessary. What about the first? Should IPED try to recover it? @lfcnassif
I agree to 2. Closing an open linear ring could lead to wrong conclusions, as you noticed. If converting them to a "linear string" is simple and don't take too much processing time, that's seems a good approach.
Processing a 150GB UFDR with master, 50950 exceptions like below were printed in the processing log:
Not sure if it is a bug or just a corrupted geometry object, at least I think above logging should be less verbose.