yeesian / ArchGDAL.jl

A high level API for GDAL - Geospatial Data Abstraction Library
https://yeesian.github.io/ArchGDAL.jl/stable/
Other
141 stars 27 forks source link

faster getpoint #369

Closed rafaqz closed 1 year ago

rafaqz commented 1 year ago

getpoint is pretty slow currently, to the point rasterizing a shapefile from ArchGDAL.jl is 20x slower than one from Shapefile.jl. Mostly because getting points allocates every time.

This PR greatly (but not completely) reduces the gap by:

  1. Not returning GDAL points. They are pretty inefficient when we have thousands of points. A Tuple is better.
  2. Preallocating Ref for getpoint! in the iterator for Geointerfce.getpoint(linestring), and reusing them for each point.

This is implemented for line string but that means polygon will also work. It's probably not worth the effort of passing the Ref through from polygons, but I haven't checked. There may be more objects that need this treatment.

Edit: I assumed this would have a test already, but seems not.

evetion commented 1 year ago

I was slightly hesitant about changing the return type here, but I think this still fits as it qualifies as point.

We could better document that we essentially are changing GeoInterface.coordinates to be a nested Vector of a Tuple instead of a Vector. That has impact on some construction methods as well.

rafaqz commented 1 year ago

Yeah, I tried to optimise the gdal point first but the overhead of allocating them all individually is just really bad.

Once all the methods here accept any point it won't make much difference either.

And yes I've noticed there's still a lot of nested vector around. It would be a good performance gain to swap them all to tuples.

rafaqz commented 1 year ago

Another thought I had writing this is it would be faster to use getx, gety etc because they don't need allocations at all.

But it assumes trad point order, so I went with getpoint!.

rafaqz commented 1 year ago

I guess its a breaking change? But calling GeoInerface methods returns a lot of types of points already, and it doesn't promise what kind of point comes from what parent object.

I also doubt it will actually break much.

We should actually put this on getgeom and I guess it's just multipoint missing? I'm not sure that will be faster because it's accessed differently.

evetion commented 1 year ago

Do we have a roundtrip test? I.e. ArchGDAL.createpoint(GeoInterface.getpoint))? Similarly for GeoInterface.coordinates, if this PR changes that.

rafaqz commented 1 year ago

It doesn't touch coordinates. But yes round trip on points would be good. I think that should just work already.

Edit: a tuple is just a normal input for createpoint, it's already tested..

Edit2: I added round trip tests for LineaRing and LineString