Status of GeoInterface and Geometry objects

yeesian / ArchGDAL.jl

A high level API for GDAL - Geospatial Data Abstraction Library

https://yeesian.github.io/ArchGDAL.jl/stable/

Other

142 stars 26 forks source link

Status of GeoInterface and Geometry objects #180

Closed juliohm closed 3 years ago

juliohm commented 3 years ago

I am working very actively with vector data currently and the GeoDataFrames.jl approach is super useful. I noticed that the core functionality is implemented in ArchGDAL.jl tables, and so I wonder what is necessary to make the workflow function with custom geometry types?

My geostatistical work requires the geometry types in Meshes.jl, which has native Julia implementations for various operations like intersection, polygonal areas, etc. Can you please share what is the recommended method to load shapefiles from disk using Meshes.jl geometries as opposed to GDAL geometries? In this use case it is totally fine if the geometries need to be copied from disk.

visr commented 3 years ago

Linking also @evetion's answer here: https://discourse.julialang.org/t/ann-announcing-geodataframes-jl/56735/3

Would be great to get https://github.com/JuliaGeo/GeoInterfaceRFC.jl to the finish line, and implement it in both ArchGDAL and Meshes. Otherwise, currently your best bet would be to go through the current GeoInterface.jl, and construct your Meshes.jl geometries from those (inefficient) types.

I see in the linked discourse thread you suggest that GeoDataFrames.jl implement a Meshes.jl trait. That would mean depending on all of Meshes for a trait. In general, at least for GeoInterfaceRFC we avoid dependencies and have no central geometry type. Your ambitions for Meshes are very high, implementing both geometry types as well as operations on them. Have you thought about possibly splitting this up in the future, such that the types can be loaded without all algorithms? Or do you think it is too early for that?

juliohm commented 3 years ago

Would be great to get https://github.com/JuliaGeo/GeoInterfaceRFC.jl to the finish line, and implement it in both ArchGDAL and Meshes.

Looking at the the GeoInterfaceRFC.jl source code I see functions like touches, intersects, i.e. functions that describe operations with geometries. Notice that my issue is different. All I need right now is to be able to load GDAL geometries from disk as Meshes.jl geometries. The interface I will be using is already defined in Meshes.jl for these geometry types, and I don't need an additional interface. Does it make sense? The challenge is not about using operations with geometries consistently according to a GeoInterface.jl, it is about loading and instantiating the correct geometry types for the application.

I see in the linked discourse thread you suggest that GeoDataFrames.jl implement a Meshes.jl trait.

In this case I meant that the dataframe as a whole could implement a Meshes.Data trait. This is a different issue than the trait for the individual geometries. So if someone wants to load a table of geometries and then wants to interpolate a field in a Cartesian grid this would work directly. That is why I also think that this Meshes.Data trait implementation could live in a high-level package like GeoDataFrames.jl.

In summary, all I need right now is to load a shapefile where the geom column has Meshes.jl geometries:

import GeoDataFrames as GDF

table = GDF.read("foo.shp")

table.geom # this is now a column of Meshes.jl geometries instead of GDAL geometries.

How to achieve that in a way that is aligned with the other ongoing efforts?

juliohm commented 3 years ago

@evetion would it make sense to submit a PR to GeoDataFrames.jl that converts the GDAL geometries to Meshes geometries? Or even a keyword argument to the read function that specifies the geometry collection:

table = GeoDataFrame.read("foo.shp", geometry = :meshes)

visr commented 3 years ago

Looking at the the GeoInterfaceRFC.jl source code I see functions like touches, intersects, i.e. functions that describe operations with geometries. Notice that my issue is different. All I need right now is to be able to load GDAL geometries from disk as Meshes.jl geometries.

I understand you want to be able to convert geometries. GeoInterface enables this, through functions like geomtype, ncoord, ngeom, and getgeom. Functions like intersects have been added since they are also part of the closely related Simple Features specification.

juliohm commented 3 years ago

Can you elaborate? How does geomtype help in this specific issue? I understand that ArchGDAL.jl always loads a GDAL geometry type (lazily on disk). I want to convert this type to a different geometry type. How geomtype can help?

yeesian commented 3 years ago

Can you please share what is the recommended method to load shapefiles from disk using Meshes.jl geometries as opposed to GDAL geometries? In this use case it is totally fine if the geometries need to be copied from disk.

The (unfortunate) thing about the (Arch)GDAL workflow is that it doesn't work with custom geometry types. Given that you're okay with making copies of the geometries, the most straightforward process is to load the files using (Arch)GDAL, and then construct the corresponding Meshes.jl geometries from there.

The challenge is not about using operations with geometries consistently according to a GeoInterface.jl, it is about loading and instantiating the correct geometry types for the application.

The drivers in GDAL are implemented to load the data from file into an opaque in-memory object. The approach I have in mind for modifying https://github.com/yeesian/ArchGDAL.jl/blob/master/src/tables.jl to materialize to geometries other than IGeometry, is via a traits-based approach[1] via GeoInterfaceRFC.jl. To make it work for meshes, the high-level approach [2] has been described by @visr in https://github.com/yeesian/ArchGDAL.jl/issues/180#issuecomment-822770787.

[1]: We have seen a similar approach to generic programming for tabular data in https://github.com/JuliaData/Tables.jl. [2]: It might look similar to https://gdal.org/development/rfc/rfc64_triangle_polyhedralsurface_tin.html.

juliohm commented 3 years ago

The (unfortunate) thing about the (Arch)GDAL workflow is that it doesn't work with custom geometry types. Given that you're okay with making copies of the geometries, the most straightforward process is to load the files using (Arch)GDAL, and then construct the corresponding Meshes.jl geometries from there.

This is the pipeline I have in mind and it will work perfectly in 99.9% of cases because these are row-based tables. Users will never need to copy all geometries from disk at once if the algorithm is row-based, which is quite often the case.

The drivers in GDAL are implemented to load the data from file into an opaque in-memory object. The approach I have in mind for modifying https://github.com/yeesian/ArchGDAL.jl/blob/master/src/tables.jl to materialize to geometries other than IGeometry, is via a traits-based approach[1] via GeoInterfaceRFC.jl. To make it work for meshes, the high-level approach [2] has been described by @visr in #180 (comment).

Yes, this opaque geometric representation is what I don't want. The approach you described to convert IGeometry to more transparent geometric types is what I am looking for. It is still not clear to me, however, that a GeoInterface would be useful here. Every time you do a conversion of types you need to pick a specific concrete type with a specific constructor. I don't see a trait here? This is the code I am using currently as a hack to convert the geometry column:

import ArchGDAL as AG
import GDAL

# convert any IGeometry to Meshes.jl geometry
function convert2meshes(geom)
    pointtypes = (GDAL.wkbPoint, GDAL.wkbPoint25D, GDAL.wkbPointM, GDAL.wkbPointZM)
    multipointtypes = (GDAL.wkbMultiPoint, GDAL.wkbMultiPoint25D,
                       GDAL.wkbMultiPointM, GDAL.wkbMultiPointZM)
    linetypes = (GDAL.wkbLineString, GDAL.wkbLineString25D, GDAL.wkbLineStringM,
                 GDAL.wkbLineStringZM)
    multilinetypes = (GDAL.wkbMultiLineString, GDAL.wkbMultiLineString25D,
                      GDAL.wkbMultiLineStringM, GDAL.wkbMultiLineStringZM)
    polygontypes = (GDAL.wkbPolygon, GDAL.wkbPolygon25D, GDAL.wkbPolygonM,
                    GDAL.wkbPolygonZM)
    multipolygontypes = (GDAL.wkbMultiPolygon, GDAL.wkbMultiPolygon25D,
                         GDAL.wkbMultiPolygonM, GDAL.wkbMultiPolygonZM)
    collectiontypes = (GDAL.wkbGeometryCollection,
                       GDAL.wkbGeometryCollection25D, GDAL.wkbGeometryCollectionM,
                       GDAL.wkbGeometryCollectionZM)

    gtype = AG.getgeomtype(geom)
    if gtype ∈ linetypes
        chain2meshes(geom)
    elseif gtype ∈ polygontypes
        poly2meshes(geom)
    elseif gtype ∈ multipolygontypes
        mpoly2meshes(geom)
    else
        throw(ErrorException("Invalid"))
    end
end

function chain2meshes(chain)
    npoint = AG.ngeom(chain)
    points = [AG.getpoint(chain, i) for i in 0:npoint-1]
    Chain([p[1:2] for p in points]) # ignores 3rd component for now
end

function poly2meshes(poly)
    nchain = AG.ngeom(poly)
    chains = [AG.getgeom(poly, i) for i in 0:nchain-1]
    cs = chain2meshes.(chains)
    PolyArea(cs[1], cs[2:end])
end

function mpoly2meshes(mpoly)
    npoly = AG.ngeom(mpoly)
    polys = [AG.getgeom(mpoly, i) for i in 0:npoly-1]
    ps = poly2meshes.(polys)
    Multi(ps)
end

So as you can see I need to write specific types (e.g. Chain, PolyArea, Multi) in order to construct the geometries. This imples (1) Meshes.jl must be a dependency of such conversion and (2) traits can't be used to write this generically. Let me know if I am missing something.

If Meshes.jl is a heavy dependency for ArchGDAL.jl, then I will have to create a separate package to just load the table with IGeometry geometries and then convert the columns to native Julia geometries. For the applications I have in mind, GDAL is only a reader/writer of shape files.

visr commented 3 years ago

Yes, examples are always helpful. Here is what I had in mind.

using ArchGDAL
using GeoDataFrames
using Meshes
using GeoInterfaceRFC

# prepare a ArchGDAL geometry representing the LineString below
geojson = """
{
  "type": "LineString",
  "coordinates": [
    [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
  ]
}
"""

open("feature.geojson", "w") do io
    print(io, geojson)
end

df = GeoDataFrames.read("feature.geojson")
g = df.geom[1]

### implement GeoInterface for ArchGDAL, this code would go into ArchGDAL
function GeoInterfaceRFC.geomtype(g::ArchGDAL.AbstractGeometry)
    type = ArchGDAL.getgeomtype(g)
    if type == ArchGDAL.GDAL.wkbLineString
        return GeoInterfaceRFC.LineString()
    end
    # leaving out other geometry types for brevity, also below
    error("not implemented")
end

GeoInterfaceRFC.ncoord(::GeoInterfaceRFC.LineString, g::ArchGDAL.AbstractGeometry) = ArchGDAL.getcoorddim(g)
GeoInterfaceRFC.ngeom(::GeoInterfaceRFC.LineString, g::ArchGDAL.AbstractGeometry) = ArchGDAL.ngeom(g)
GeoInterfaceRFC.getgeom(::GeoInterfaceRFC.LineString, g::ArchGDAL.AbstractGeometry, i) = ArchGDAL.getpoint(g, i-1)
### end of GeoInterface implementation for ArchGDAL

# try out the functions we just defined
GeoInterfaceRFC.geomtype(g)  # -> GeoInterfaceRFC.LineString()
GeoInterfaceRFC.ncoord(g)  # -> 2
GeoInterfaceRFC.ngeom(g)  # -> 4
GeoInterfaceRFC.getgeom(g, 2)  # -> (103.0, 1.0, 0.0)
# (for the point, since ncoord is 2, probably a 2-tuple has to be returned)

### Meshes could add GeoInterface based constructors for its geometries
function Meshes.Chain(::GeoInterfaceRFC.LineString, g)
    npoint = GeoInterfaceRFC.ngeom(g)
    points = [GeoInterfaceRFC.getgeom(g, i) for i in 1:npoint]
    Chain([p[1:2] for p in points])  # ignores 3rd component for now
end

# Now Meshes can create Chains from any geometry that implements the GeoInterfaceRFC
# LineString interface, using only GeoInterfaceRFC as a dependency
Chain(GeoInterfaceRFC.geomtype(g), g)
# 4-chain{2,Float64}
#   └─Point(102.0, 0.0)
#   └─Point(103.0, 1.0)
#   └─Point(104.0, 0.0)
#   └─Point(105.0, 1.0)

I think this is preferred over taking Meshes as a dependency, because it generalizes to any geometry type package that is willing to implement the GeoInterface, they would all be able to convert between each other.

juliohm commented 3 years ago

As I imagined, this interface doesn't work for me because it requires adding (very unnatural) constructors to existing geometry types. At this point I don't think that having a generic GeoInterface to instantiate geometries will help users interested in moving away from the opaque GDAL experience.

Also, I would reassess the need for a GeoInterface in general. I know some of you like this idea a lot, but for me this goal of supporting multiple implementations of geometry types in Julia doesn't seem very productive. In practice, algorithms will be developed for specific, well-tested geometry types with specific data structures and constructors, and we can hardly say what this interface looks like today. A more productive path in my opinion is to load the geometries from disk with GDAL, convert them lazily to Meshes.jl geometries row-by-row as I mentioned, and move on.

I will proceed with the plan and will write a simple package that loads the geometries from disk in a Tables.jl table using ArchGDA.jl/GDAL.jl but where the geometry column has Meshes.jl geometries built on the fly. I will also implement the Meshes.Data trait for this table so that it can be automatically used as geospatial data in GeoStats.jl.

Thank you all for the clarifications.

visr commented 3 years ago

As I imagined, this interface doesn't work for me because it requires adding (very unnatural) constructors to existing geometry types.

In the example I decided to add a constructor for Meshes.Chain, but how you wish to call this is really up to you, and not part of the interface. If you think chain2meshes is a better name, you can go for that as well. Though it looks like you had already made up your mind.

evetion commented 3 years ago

As I imagined, this interface doesn't work for me because it requires adding (very unnatural) constructors to existing geometry types.

I think the example by @visr for the Meshes.Chain(::GeoInterfaceRFC.LineString, g) function is actually quite elegant. It even works for non GDAL geometry and proves that traits are a possible way of doing what you want, answering your questions. Whether traits result in unnatural constructors is another topic, it does feel like black magic sometimes.

Also, I would reassess the need for a GeoInterface in general. I know some of you like this idea a lot, but for me this goal of supporting multiple implementations of geometry types in Julia doesn't seem very productive.

I can agree with your point of productiveness, but I think we also all have to agree that there already are (1) multiple types of geometry in Julia (just like many types of DataFrame), whether we like it or not and (2) that Meshes.jl can't replace them all. Not only because we're competing with many man-years already spent in the mature GDAL ecosystem, but also because the needs are so different that there couldn't be one blessed type. That realization is why GeoInterface.jl exists in the first place.

In practice, algorithms will be developed for specific, well-tested geometry types with specific data structures and constructors, and we can hardly say what this interface looks like today. A more productive path in my opinion is to load the geometries from disk with GDAL, convert them lazily to Meshes.jl geometries row-by-row as I mentioned, and move on. I will proceed with the plan and will write a simple package that loads the geometries from disk in a Tables.jl table using ArchGDA.jl/GDAL.jl but where the geometry column has Meshes.jl geometries built on the fly. I will also implement the Meshes.Data trait for this table so that it can be automatically used as geospatial data in GeoStats.jl.

In order to be productive in the context of the larger ecosystem, I'd advise to not quickly reject existing ideas and proposals. Since we do seem to agree on a traits interface, an interface package like MeshTraits.jl would be easier to support and implement than a large dependency like Meshes.jl itself.

juliohm commented 3 years ago

Sorry Marteen I have to disagree with you in multiple dimensions.

I think the example by @visr for the Meshes.Chain(::GeoInterfaceRFC.LineString, g) function is actually quite elegant. It even works for non GDAL geometry and proves that traits are a possible way of doing what you want, answering your questions. Whether traits result in unnatural constructors is another topic, it does feel like black magic sometimes.

This is not elegant. Carrying GIS jargon like LineString, MultiLineString forward is far from an elegant treatment of computational geometry. I know people are used to these terms but they are extremely confusing to anyone else outside of GIS.

I can agree with your point of productiveness, but I think we also all have to agree that there already are (1) multiple types of geometry in Julia (just like many types of DataFrame), whether we like it or not and (2) that Meshes.jl can't replace them all

I disagree 100% here. What we have currently are multiple half-ready, half-buggy geometry types that are not maintained. The only actively maintained and decently organized codebase for geometries that I am aware of is Meshes.jl, and it is not because I am leading this development. No one working seriously with computational geometry can develop anything moderately complex with GeometryTypes.jl, GeometryBasics.jl and related efforts. It simply doesn't work. The design is terrible with everything behaving like big vectors where multiple dispatch doesn't help you at all in navigating this complexity. Sorry if I am being harsh here, I have to put it out there because my impression is that you still have some hope for GeometryBasics.jl.

Not only because we're competing with many man-years already spent in the mature GDAL ecosystem, but also because the needs are so different that there couldn't be one blessed type. That realization is why GeoInterface.jl exists in the first place.

It is fine if people prefer to use GDAL. That is like using SymPy from Julia, or using ScikitLearn.jl or any other mature system. I am not this user, and I do want to develop new methods from computational geometry that GDAL does not support because it is old and opaque for Julia users. That is why I am creating a separate package that uses the geometries I care about. People interested in GDAL will continue to use GDAL.jl and ArchGDAL.jl geometries and tables.

In order to be productive in the context of the larger ecosystem, I'd advise to not quickly reject existing ideas and proposals. Since we do seem to agree on a traits interface, an interface package like MeshTraits.jl would be easier to support and implement than a large dependency like Meshes.jl itself.

Interfaces are great when the ecosystem has at least one robust implementation as a proof of concept. Other Julia organizations attempted to create interfaces before the existence of full implementations and failed. See JuliaML for an example. There we have packages that are dead and completely isolated from MLJ.jl, Flux.jl and other major efforts because, well... Flux.jl and MLJ.jl do not follow the interface that other people did put together before an actual implementation was ready. I had to refactor LearnBase.jl and LossFunctions.jl entirely just to start my own research and the API is still limited there compared to what people have come up with in Flux.jl and MLJ.jl.

Don't get me wrong, this GeoInterface proposal seems nice, but it is not improving the situation in practice. I feel it is just delaying the development of geometries in a central hub. Instead of discussing the man power that went into GDAL, we should be putting man power into a package like Meshes.jl to alleviate this issue. Again, this is my personal opinion. I only wish more people from JuliaGeo had the same vision.

yeesian commented 3 years ago

Thanks for the discussions! I think it's run its course, and have opened #181 and #182 as feature requests to cover existing feature gaps that were discussed in this thread.