libgeos / geos

Geometry Engine, Open Source
https://libgeos.org
GNU Lesser General Public License v2.1
1.17k stars 354 forks source link

Stackoverflow in DiscreteFrechetDistance with ~larger linestrings (Windows) #516

Open Lgum opened 2 years ago

Lgum commented 2 years ago

When trying to calculate the Frechet distance between two linestrings (each around 1500 segments), I get a segmentation fault on windows 10.

I can avoid this, by specifying the stack reserve size in Visual Studio, and putting it to i.e. 8388608. While this works, larger linestring will potentially crash also unix machines. In this case, a stack size of roughly 3 MiB is enough.

Tested on Windows 10, with Geos 3.10.1, Visual Studio 2019.

The linestrings are attached, the code is as follows:

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
#include <tuple>

#include <geos/geom/CoordinateSequenceFactory.h>
#include <geos/geom/CoordinateSequence.h>
#include <geos/geom/GeometryFactory.h>
#include <geos/algorithm/distance/DiscreteFrechetDistance.h>

std::vector<std::tuple<double, double, double>> tupcsvRead(std::string sFilename)
{
    std::ifstream ifs(sFilename, std::ifstream::in);

    if (!ifs.is_open()) throw std::runtime_error("Could not open file");

    std::string line;

    std::vector<std::tuple<double, double, double>> vCoord;

    if (ifs.good())
    {        
        while (std::getline(ifs, line))
        {            
            std::stringstream ss(line);            
            std::string sX, sY, sZ;

            for (int i = 0; i < 3; ++i)
            {
                if (i == 0)
                    std::getline(ss, sX, ',');
                else if (i == 1)
                    std::getline(ss, sY, ',');
                else if (i == 2)
                    std::getline(ss, sZ, ',');

            }
            std::tuple<double, double, double> newCoordTuple = std::make_tuple(atof(sX.c_str()), atof(sY.c_str()), atof(sZ.c_str()));

            vCoord.push_back(newCoordTuple);
        }
    }

    return vCoord;
}

void fillCoord(std::vector<std::tuple<double, double, double>> vCoord, std::unique_ptr<geos::geom::CoordinateSequence> &coordSeq)
{
    for (int i = 0; i < vCoord.size(); ++i)
    {
        geos::geom::Coordinate coord;
        coord.x = std::get<0>(vCoord.at(i));
        coord.y = std::get<1>(vCoord.at(i));
        coord.z = std::get<2>(vCoord.at(i));

        coordSeq->setAt(coord, i);
    }
}

int main()
{

    std::string sFilenameOne = "E:\\Dev_source\\pygeos_test\\ls1.csv";
    std::string sFilenameTwo = "E:\\Dev_source\\pygeos_test\\ls2.csv";

    std::vector<std::tuple<double, double, double>> vCoordOne = tupcsvRead(sFilenameOne);
    std::vector<std::tuple<double, double, double>> vCoordTwo = tupcsvRead(sFilenameTwo);

    geos::geom::GeometryFactory::Ptr geoFactory = geos::geom::GeometryFactory::create();

    std::unique_ptr<geos::geom::CoordinateSequence> coordseqOne = geoFactory->getCoordinateSequenceFactory()->create(vCoordOne.size(), 3);
    std::unique_ptr<geos::geom::CoordinateSequence> coordseqTwo = geoFactory->getCoordinateSequenceFactory()->create(vCoordOne.size(), 3);

    fillCoord(vCoordOne, coordseqOne);
    fillCoord(vCoordTwo, coordseqTwo);

    geos::geom::LineString *lsOne = geoFactory->createLineString(*coordseqOne);
    geos::geom::LineString* lsTwo = geoFactory->createLineString(*coordseqTwo);

    geos::algorithm::distance::DiscreteFrechetDistance frecherHagel(*lsOne, *lsTwo);

    double dDist = frecherHagel.distance();

    std::cerr << dDist << std::endl;

}

ls2.csv ls1.csv

dr-jts commented 2 years ago

This is probably because the current GEOS DiscreteFrechetDistance algorithm is a recursive one, and so can use a lot of stack.

JTS now has a DiscreteFrechetDistance implementation which uses an internal stack, and so does not have this problem. It uses a dynamic programming approach which should also be a lot faster. See https://github.com/locationtech/jts/pull/764 and https://github.com/locationtech/jts/pull/783.