rafaqz / Rasters.jl

Raster manipulation for the Julia language
MIT License
209 stars 36 forks source link

more consistent geometries handling #671

Closed tiemvanderdeure closed 2 months ago

tiemvanderdeure commented 4 months ago

This PR aims to make handling of geometries more consistent.

The goal is that all functions that take geometries as input (e.g. extract, rasterize, boolmask, zonal) will behave consistently.

Progress:

Do you want to give your two cents on these decisions @rafaqz?

rafaqz commented 4 months ago

Thanks for starting this.

I think both geometrycolumn and allowing point tables makes sense to do everywhere. Point tables are very common in datasets not focussed on geometries, e.g. things like GBIF.

There is also mask and missingmask.

For the docs, we should use shared docs constants to keep it consistent.

tiemvanderdeure commented 4 months ago

@nicolasmerino41 pointed out that _extent on a point table was totally broken (it had a call to Extensts). So if we choose to implement them we might take the opportunity to add tests ;).

rafaqz commented 4 months ago

Well sharing the code should help that too

tiemvanderdeure commented 4 months ago

Tests fail because rasterizing a Raster no longer works. I can't really think of a reason why that should work, though. Is there one?

rafaqz commented 4 months ago

I guess it's like mask where that's a fairly sensible thing to do. Maybe less so for rasterize

tiemvanderdeure commented 4 months ago

After looking into the test fails, I think we should not allowing the data argument in rasterize to be a Raster anymore.

Two things are tested: that rasterizing a Raster can return itself, and that rasterizing a Raster with a fill argument can return a RasterStack or Raster with fill values. The first is redundant, and there are much more intuitive ways to accomplish the second. Also, this behaviour is undocumented.

I think that for the most part mask, boolmask, and missingmask just pass their geoms to burn_geometry!.

I'll also add a geometrycolumn keyword to all the functions and allow for point tables. I agree it makes sense to allow point tables, there are many cases where you otherwise have to go out of your way to make a geometry column.

rafaqz commented 4 months ago

Ok let's delete it.

tiemvanderdeure commented 4 months ago

I've now implemented point tables. They will only be used if geometrycolumn is specified as a Tuple of Symbols, which I think is the way to go. Automatically detecting point column would open a whole can of worms and I don't think we want to go there.

I've made the docstrings more consistent throughout, but there might still be room for improvement to make them clearer. Same goes for error messages.

Also want to flag that functions such as boolmask now call _get_geometries twice, first because _init_bools calls _extent and then again in _burn_geometry!. I think this is a pretty minor problem and fixing it is not trivial.

I also still need to look at adding more tests.

rafaqz commented 4 months ago

Automatically detecting point column would open a whole can of worms and I don't think we want to go there.

Absolutely, users should always have to specify the columns.

Also want to flag that functions such as boolmask now call _get_geometries twice

Can we call _get_geometries at the top level before we do anything else? I know I'm a little pedantic with these things but they do add up, in both performance and in our mental models of "what happens in the algorithm".

boolmask is used in other methods too, so it should be one of the most optimised and simple things in the package.

tiemvanderdeure commented 4 months ago

Can we call _get_geometries at the top level before we do anything else?

That would be the way to go, I suppose. I just spent an hour or two on trying to make this work, without too much to show for my efforts so far.

If we do this, we need to have dedicated dispatches for all methods at lower levels on objects returned from _get_geometries, similar to how they dispatch on GeoInterface.Features, and so we need to add a bunch of dispatches.

Right now a function like extent just calls GI.trait, but that will return nothing for objects returned by _get_geometries, and so there is no way to discern those from all kinds of other objects that we want to call _get_geometries on.

tiemvanderdeure commented 4 months ago

Thinking about this a bit more, I guess an alternative approach could be to just add a dispatch for _get_geometries on AbstractVector and just make it return the object passed (maybe with a check_geometries keyword that defaults to false).

In that case some function calls would still have a bunch multiple calls to _get_geometries, but most of them have basically no cost, and we wouldn't need to add all those dispatches.

rafaqz commented 4 months ago

Don't we have to deal with user supplied vectors of geoms anyway?

tiemvanderdeure commented 4 months ago

Yes, but there's not a dispatch on it - _get_geometries just assume that if something is not a table it must be an iterable, and then check if all elements in the iterable are actually geometires.

I'm not sure if the compiler always figures this out - if it does, we can call _get_geometries as many times as we like (as long as we pass a vector of geometries) and this doesn't matter. If not I think we'll need some dedicated dispatch of _get_geometries.

rafaqz commented 4 months ago

Ah right. Yeah I wish GeoInterface traits worked on types then this could just compile away (from vector eltype)

rafaqz commented 4 months ago

We can also merge as is and I can tweak it later.

tiemvanderdeure commented 3 months ago

I think it makes sense to call _get_geometries at the top level and pass the resulting object. As long as _get_geometries returns doesn't Vector{Any}, any further calls of _get_geometries on it won't have any cost. This is also pretty easy to implement. I think that in practice the case with Vector{Any} will be quite rare.

tiemvanderdeure commented 3 months ago

Okay I have one more question: some methods (e.g. rasterize) have dispatches on GeoInterface traits (such as FeatureCollection) whereas others do not. I'm not so familiar with these - what objects could these be (other than those this PR covers already) and should _get_geometries work on these as well or not?

rafaqz commented 3 months ago

Probably they should all handle Feature/Feature collection.

For our purpose here Feature is a wrapped geometry (unwrap with GI.geometry) and FeatureCollection holds something like a vector of Feature (you iterate over it with GI.getfeature)

So yes _get_geometry should handle that.

But dispatch on FeatureCollection and Feature in rasterize may be needed to extract named fill values? It's like a table but may not actually be a Tables.jl table.

tiemvanderdeure commented 2 months ago

It's again failing on this

[3017] signal (6.-6): Aborted
in expression starting at /home/runner/work/Rasters.jl/Rasters.jl/test/resample.jl:9
ERROR: LoadError: Package Rasters errored during testing (received signal: 11)

which I don't get on my machine. So not sure where to start to fix this.

rafaqz commented 2 months ago

Damn that's just GDAL I'm not sure what's causing it, and no time to look until JuliaCon probably.

You could move the resample tests to the end? Then we'll know this passed.

tiemvanderdeure commented 2 months ago

Something I did in the latest commit fixed it, so checks pass now! As far as I'm concerned this one is good to go, but maybe you could give it a look whenever you have time.

rafaqz commented 2 months ago

Can you rebase main?