sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
893 stars 214 forks source link

Thousands of exceptions when parsing a geometry object #2013

Open lfcnassif opened 7 months ago

lfcnassif commented 7 months ago

Processing a 150GB UFDR with master, 50950 exceptions like below were printed in the processing log:

java.lang.IllegalArgumentException: Points of LinearRing do not form a closed linestring
    at com.vividsolutions.jts.geom.LinearRing.validateConstruction(LinearRing.java:111)
    at com.vividsolutions.jts.geom.LinearRing.<init>(LinearRing.java:106)
    at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:355)
    at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:342)
    at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:236)
    at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:256)
    at iped.geo.parsers.kmlstore.KMLParser.parsePlacemark(KMLParser.java:157)
    at iped.geo.parsers.kmlstore.KMLParser.parse(KMLParser.java:64)
    at iped.geo.parsers.kmlstore.KMLFeatureListFactory.parseFeatureList(KMLFeatureListFactory.java:12)
    at iped.geo.parsers.GeofileParser.parse(GeofileParser.java:76)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
    at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
    at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)

Not sure if it is a bug or just a corrupted geometry object, at least I think above logging should be less verbose.

patrickdalla commented 7 months ago

Hi nassif, could you share the data that trigger this errors?

Il giorno mar 5 dic 2023 alle ore 16:17 Luis Filipe Nassif < @.***> ha scritto:

Processing a 150GB UFDR with master, 50950 exceptions like below were printed in the processing log:

java.lang.IllegalArgumentException: Points of LinearRing do not form a closed linestring at com.vividsolutions.jts.geom.LinearRing.validateConstruction(LinearRing.java:111) at com.vividsolutions.jts.geom.LinearRing.(LinearRing.java:106) at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:355) at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:342) at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:236) at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:256) at iped.geo.parsers.kmlstore.KMLParser.parsePlacemark(KMLParser.java:157) at iped.geo.parsers.kmlstore.KMLParser.parse(KMLParser.java:64) at iped.geo.parsers.kmlstore.KMLFeatureListFactory.parseFeatureList(KMLFeatureListFactory.java:12) at iped.geo.parsers.GeofileParser.parse(GeofileParser.java:76) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) at iped.parsers.standard.StandardParser.parse(StandardParser.java:245) at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)

Not sure if it is a bug or just a corrupted geometry object, at least I think above logging should be less verbose.

— Reply to this email directly, view it on GitHub https://github.com/sepinf-inc/IPED/issues/2013, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG247S64DMT5GIUGH6L36SDYH56MVAVCNFSM6AAAAABAIIUYJWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDOMBVGQ2TGMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lfcnassif commented 7 months ago

Sure, I'll try to find a triggering file tomorrow.

lfcnassif commented 2 months ago

Hi @patrickdalla, I found the case triggering this, just sent the KML samples to you by Teams.

patrickdalla commented 2 months ago

Many of the exceptions are of type "Points of LinearRing do not form a closed linestring". This, according to https://gis.stackexchange.com/questions/93946/getting-points-of-linearring-do-not-form-a-closed-linestring, is a source syntax error. Some GIS, like postgis extension to postgres, offer methods to solve some of this inconsistencies (https://postgis.net/docs/ST_MakeValid.html).

So, I suggest 2 options (or both): 1) We can try similar recover method. In the case, o unclosed linestring, we can repeat the first coord at the end of the linestring and "close it". 2) We can group all these exceptions and inform about them in a more resumed form.

I think the second is necessary. What about the first? Should IPED try to recover it? @lfcnassif

patrickdalla commented 2 months ago

I saw that in some files were defined "linearring" 's with 2 coords only, what leads to another exception, that a linearring needs at least 4 coords (even if we add the first coord as the last). So another solution would to create "linestring" objects from these invalid linearring entries.

Does it seem a good method to bypass these invalid entries?

patrickdalla commented 2 months ago

Maybe the 1 option (repeat the first coord as the last) is not good. Look at a sample. I think the best option is always change the type to "linearstring" when a non closed "linearring" is defined. image The above object has the description "rio corrego", so it seems to map a river. Closing it does not represent the river. So the best option is to represent it as "linearstring" although defined as "linearring".

patrickdalla commented 2 months ago

Some other syntax/semantics errors to bypass or correct: 1) placemarks coordinates without content 1) placemarks with no content 2) references to gx tags without xml namespace declaration 3) xml UTF-8 encoding declared but windows-1252 used 4) Document tag inside other document tag

lfcnassif commented 2 months ago

I think the second is necessary. What about the first? Should IPED try to recover it? @lfcnassif

I agree to 2. Closing an open linear ring could lead to wrong conclusions, as you noticed. If converting them to a "linear string" is simple and don't take too much processing time, that's seems a good approach.