mkborregaard / VerySimpleRasters.jl

WIP - very simple raster format for juia
MIT License
5 stars 0 forks source link

Usability of this package and the JuliaGeo ecosystem #1

Open mkborregaard opened 5 years ago

mkborregaard commented 5 years ago

Ported from Slack. Me:

Over the last week or two I’ve been working on very simple raster implementation in pure julia. The reason for doing that is that when I had a workshop earlier this month, with 30 very interested biogeographers at a conference, I asked them what it would take for them to move to Julia. And they almost all said “convenient raster support”. And - it is my impression that that support is not quite there yet in Julia. Of course you can do amazing things with ArchGDAL, but in truth almost all of my work with rasters involve getting data out of them and using that for analyses, and some plotting.

So, I designed an analysis in R: https://www.dropbox.com/s/n8790r0s63d56hm/R%2Bworked%2Braster%2Bexample.html?dl=0 and set out to create a package that could replicate it in Julia. The result is https://github.com/mkborregaard/VerySimpleRasters.jl and the replicated analysis is here: https://www.dropbox.com/s/d5mc52corz5hm7s/VerySimpleRasters%2Bexample.html?dl=0

All operations happen on disk, out of memory, and the script is very fast even for the relatively big raster in the example (except the predictable waiting time on plots).

There’s still lots to do - so far it uses R’s native raster format as it’s own native format, and supports importing ESRI Ascii grids to that format. Other formats could possibly be provided by GDAL in some way like the old RasterIO package?

I’d really love inputs - how to make this consistent with the Geointerface, is it worth continuing in this direction or have I overlooked some obvious existing capabilities etc.

For anyone wanting to play with the package, notebooks and data, I’ve collected everything in a nice folder here: https://www.dropbox.com/sh/4shiomz5gj4ggx5/AAASQv_K5p2OadhwL-CdRuEja?dl=0

cc @yeesian @visr @c42f

mkborregaard commented 5 years ago

Then @yeesian: Yeesian Ng [5:18 AM] I think the best way forward is to create the package you want to be using for your daily needs, and to just start using it. And feel free to evangelize it to people if it’s something you feel like maintaining and supporting. @juliohm has been developing and maintaining GeoStats.jl successfully for instance. It could be that parts of the JuliaGeo ecosystem may mature and make things easier in the future, but there’s no timeline for it (since there’s no consistent developmental effort that we can count on). So IMO people shouldn’t refrain from working on their own packages for processing rasters/etc in the meantime. I am interested in the questions you’re bringing up too, e.g. when you write: “Here are a number of issues. First of all, VerySimpleRasters doesn’t know about Shapefiles. It just knows about polygons, and that they are a vector of Tuples. Maybe it would be useful for Shapefiles to define a polygon in accordance with GeometryTypes? Maybe GeometryTypes could satisfy the Geointerface, or the other way around? So, first I have to extract the points as a Tuple. Then, when I plot it, I see there is a problem: There are actually two polygons here, but they are treated as one. Since this breaks my masking code (but not R’s), I have to define a function to split the Polygons whenever a line returns to the starting point.” Ideally GeoInterface should have a sufficiently granular interface for others to use. The most common comprehensive one that I know of is https://www.opengeospatial.org/standards/sfa. But it will be hard to provide and maintain implementations for the full spec upfront, and it’ll be more reasonable for us to build up to it as we go (according to demonstrated need).

mkborregaard commented 5 years ago

@yeesian: In https://github.com/yeesian/ArchGDAL.jl/pull/76 , it is now:

julia> using ArchGDAL
ArchGDAL

julia> dataset = ArchGDAL.read("temperature/bio1.bil")
GDAL Dataset (Driver: MEM/In Memory Raster)
File(s):

Dataset (width x height): 2160 x 900 (pixels)
Number of raster bands: 1
  [GA_Update] Band 1 (Undefined): 2160 x 900 (Int16)

julia> ArchGDAL.read(dataset, 1) # reads the first (and only band)
2160×900 Array{Int16,2}:
 -9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
     ⋮                                  ⋮  ⋱             ⋮
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
 -9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999

so the main thing it saves you is not having to write the driver code for parsing the file format alternatively, you don’t have to check out that PR, and can do the same (more efficiently) with

using ArchGDAL

raster = ArchGDAL.read("temperature/bio1.bil") do dataset
    ArchGDAL.read(dataset, 1)
end

but I don’t think it’ll be as fast as mmap in https://github.com/mkborregaard/VerySimpleRasters.jl/blob/3fc20f476aebd12a74fe19ab583bcf8ee3c5a99b/src/grd_file.jl#L66-L82

mkborregaard commented 5 years ago

Me: So could these things be unified? I mean, could a solution be to have GDAL open a mmapped on-disk matrix with metadata (essentially what a VerySimpleRaster is). That would allow us to close the file again (making the operation safe and avoiding the function syntax), and also to treat the raster as a completely normal Julia Matrix (giving access to e.g. the entire Images stack for analysis). And if it is necessary to use GDAL to work directly on the on-disk representation, the VerySimpleRaster format keeps path to the input file, so functions on VerySimpleRaster could open the file, do the operation with GDAL, then close it again. Creating a seamless on-disk experience.

mkborregaard commented 5 years ago

@visr : Thing with gdal and mmap is that gdal supports many formats, only a few of them have binary blocks that can be fully memory mapped. Some formats are ascii, or have compression, tiling etc, so their disk representation is not the same as in memory representation. This is why the ArchGDAL example shows (Driver: MEM/In Memory Raster)

It's the reason we need GDAL as an abstraction layer if we want to support so many formats That of course leaves room for packages like VerySimpleRasters that only support some formats, for memory mapping reasons.

mkborregaard commented 5 years ago

Great! And yes agreed, I’d be perfectly content to support the mmappable ones natively and provide a conversion for the others (via GDAL most likely)

yeesian commented 5 years ago

GDAL allows you to bring-your-own format-specific drivers, but we (JuliaGeo and/or OSGeo) are not at the point where we can allow you to easily swap in a native julian implementation of a driver into GDAL. Nonetheless, if you're interested, here is their tutorial. Their code is X/MIT-licensed, so you can see how they currently do it. E.g. here's open() for ehdr.

Is it really so slow that you need to be implementing your own driver though? You can always file an issue upstream in the GDAL repository if so: it might benefit GDAL users in R/Python/Rust/etc too.

mkborregaard commented 5 years ago

No it's not that it's slow - it's that I want to avoid the clunky interface that GDAL gives you. Julia's AbstractMatrix stack is so strong - I think much can be said for having a direct julia representation of rasters, with the option of calling GDAL on them once you need something advanced. I would never in my practical work need all the capabilities of GDAL; in fact not any, other than opening different raster formats and possibly reprojection, so for me it's overkill to have to registerdrivers() do etc. for simple things.

yeesian commented 5 years ago

No it's not that it's slow - it's that I want to avoid the clunky interface that GDAL gives you. [...] so for me it's overkill to have to registerdrivers() do etc. for simple things.

Just to be clear: that's just a limitation of an existing julia wrapper, which I'm actively working to address in https://github.com/yeesian/ArchGDAL.jl/pull/76, and not a fundamental limitation of working with GDAL though.

You can still easily read in the raster data (see the examples in https://github.com/mkborregaard/VerySimpleRasters.jl/issues/1#issuecomment-458441010) and have access to all the other metadata and functionality for free via GDAL.

yeesian commented 5 years ago

No it's not that it's slow

The only reason to be parsing the grd file yourself, like in https://github.com/mkborregaard/VerySimpleRasters.jl/blob/3fc20f476aebd12a74fe19ab583bcf8ee3c5a99b/src/grd_file.jl#L66-L82 is if you deem the corresponding GDAL driver "too slow", which doesn't seem to be the case for you.

mkborregaard commented 5 years ago

Nice. So just to understand, do you not think there isn't scope for the pure-julia mmap approach once you're further with ArchGDAL?

By the way,

The only reason to be parsing the grd file yourself

another reason for doing this is then this package does not need a dependency on GDAL. The GDAL integration could live somewhere else.

mkborregaard commented 5 years ago

I thought you wrote that GDAL always loads it into memory? The thing is I want a handle on the raster contents, on-disk, that I can access directly (e.g. via getindex) - I don't care about which driver.

yeesian commented 5 years ago

I thought you wrote that GDAL always loads it into memory?

That's just my own implementation of it in https://github.com/yeesian/ArchGDAL.jl/blob/d5ee1fd0c7b6b3932178d1696ac3a6284fab48dd/src/dataset.jl#L259-L271.

GDAL doesn't load data into memory by default. But you might run into nasty memory bugs later, e.g. like in https://github.com/JuliaGeo/LibGEOS.jl/issues/58.

mkborregaard commented 5 years ago

So just to understand what you suggest exactly - scope for separate packages? Integration? Abandon this as soon ArchGDAL is finished with developments in the process of being merged?

yeesian commented 5 years ago

There is currently GeoJSON.jl and Shapefile.jl which natively reads into geojson and shapefile formats too. They are vector data formats, so we borrowed an "informal" interface in python for https://github.com/JuliaGeo/GeoInterface.jl for some degree of unification across those packages.

There is a discussion (which you're involved in) on a similar kind of interface for rasters. JuliaGeo has NetCDF.jl too, so it could help e.g. unify plotting for rasters across NetCDF and "R Raster and EHdr" (which this package seems to be oriented around. See https://www.gdal.org/frmt_various.html and the references therein for details.) and Arch/GDAL. But I don't work enough with rasters, or know enough about them to have an informed opinion about the direction it (the raster interface) might take.

rafaqz commented 5 years ago

I found this while thinking about writing something similar myself. I agree entirely that better raster handling is pretty fundamental to doing biogeography in Julia!

I feel that we need a RasterBase package that defines AbstractRaster and methods for working with coordinates and projections. I'm writing a bunch of packages for spatial simulations and this should be a basic type for all of them... It would also be great if my simulation packages kept spatial metadata bundled with matrices, so we can remove the need for passing in spatial parameters, and just plot the results with a recipe.

VerySimpleRaster seems like a good demonstration of this, but it would be great to abstract the interface, then provide integrations for NetCDF, ArchGDAL and similar packages.

I've read some of the other threads talking about something like this, is there a plan for something similar already in progress, or should I just go ahead and implement something?

mkborregaard commented 5 years ago

I'd happily depend on and adhere to an Abstract interface. I think it's just a question of doing it. In addition to @yeesian it'd probably be useful to speak to @evetion who has GeoRasters.

An alternative (possibly better) would be to put the abstract interface here, with the concrete implementation here as a fallback for the lightest case? This package has essentially 0 dependencies (well except RecipesBase but that's essentially a stdlib), to make it possible for more advanced implementations to depend on it at 0 cost. Happy to change this package to accomodate.

btw there's also some discussion here https://github.com/JuliaGeo/GeoInterface.jl/issues/16 as any abstract interface should be compatible with the geointerface imho.

rafaqz commented 5 years ago

Yeah I was also thinking the interface should include a concrete component for the most basic case.

I'll read through those threads too. Unfortunately I'm not that experienced working with rasters and spatial data, I've mostly written models and frameworks that handle matrices, and let others do the actual raster manipulation in R! But that's a short term solution that needs attention soon. I'll put a small base package together in a few weeks (after thesis hand in!) but feedback will be really useful.

Isn't there a HDF5 and Mmap dep in this package? I was thinking it should be much more lightweight than that.

mkborregaard commented 5 years ago

Mmap is part of the Base Julia distribution, so not a dep, just like Dates. HDF5 isn't used - it's just in the project.toml because I was experimenting with it. Before you start building an alternative package, do consider if we can do it in the context of this one. Happy to give you push rights if you're interested in collaborating.

rafaqz commented 5 years ago

Sure! if you're open to that I can work here.

I noticed you mentioned the problem of time as well in the other thread. It's a common problem for me too. I also have to deal with vertical dimension of layered environmental data. So it would be good to define common ways of accessing additional temporal and spatial dimensions. I was going to write a MicroclimateBase package specifically for this, but it would be good if that could depend on a more general base package and just add methods for the specific environmental variables instead of the whole framework for data access.

mkborregaard commented 5 years ago

Great, you should have push rights now. And yes goot point on the time dimension.

juliohm commented 5 years ago

Hi Rafael,

Most of GeoStats.jl is about generalizing spatial models to work on any spatial domain type. We currently have a great amount of abstraction and plot recipes for plotting solutions over domains, spatial data, etc.

At some point I will look again into the JuliaGeo packages to see if some integration is missing. The ClimateTools.jl package is already trying some integration with GeoStats.jl to let users run models on NetCDF seamlessly.

Let me know what kinds of models you have in mind and I can tell you if it fits the current implementation.

On Mon, May 27, 2019, 04:04 Rafael Schouten notifications@github.com wrote:

Sure! if you're open to that I can work here.

I noticed you mentioned the problem of time as well in the other thread. It's a common problem for me too. I also have to deal with vertical dimension of layered environmental data. So it would be good to define common ways of accessing additional temporal and spatial dimensions. I was going to write a MicroclimateBase package specifically for this, but it would be good if that could depend on a more general base package and just add methods for the specific environmental variables instead of the whole framework for data access.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mkborregaard/VerySimpleRasters.jl/issues/1?email_source=notifications&email_token=AAZQW3J4WW45YHH2VP45DLTPXOBY5A5CNFSM4GS55GS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWI65QQ#issuecomment-496103106, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQW3MS27ZEJQTRLIJDOIDPXOBY5ANCNFSM4GS55GSQ .

rafaqz commented 5 years ago

Great! NetCDF integration is something I need too.

The packages I'm developing that are specifically related are Dispersal.jl and Cellular.jl for general Cellular automata style models, and dispersal modelling with heterogeneous spatial data and various dispersal and growth processes. GrowthRates.jl provides simple growth rate grids for dispersal modelling from raster climate data (from HDF5), and Microclimate.jl for providing common microclimate data for more complex species distribution modelling, from NetCDF files. They are all in various stages of development, and all need more generalised interfaces, and further abstraction of the NetCDF and HDF5 data sources. Some kind of AbstractRaster seems to be a key component, although its taken me a while to realise that.

Honesty, I don't really understand what GeoStats and ClimateTools do! there's a bit of a mismatch in the applications and language we use in ecology/biogeography. Browsing the code it seems like AbstractSpatialData in GeoStatsBase could be what we are looking for? although the T<:Real limit would be a problem for me (Unitful).

I see ClimateTools.jl deals with 3d and 4d data in which something I specifically need for working with microclimates and I'm doing in ad-hoc ways currently. The ClimGrid type also seems to have a lot of what we need for biogeography, but also a lot more that we don't need! But inheriting from some shared abstract type would be good.

mkborregaard commented 5 years ago

@rafaqz if you're working on Species Distribution Models you may also want to check out @tpoisot 's work here: https://github.com/EcoJulia/SimpleSDMLayers.jl (just started last week) and here: https://gitlab.com/tpoisot/BioClim

mkborregaard commented 5 years ago

After looking at your repos, in general you might actually be interested in the EcoJulia organisation - the mission is to create a framework for various aspects of community ecology and geographical ecology in Julia.

rafaqz commented 5 years ago

I've been thinking about EcoJulia for a while! Was waiting until I had a few things to contribute :)

I mostly develop mechanistic SDMs, dispersal models and biophysical modelling tools (also see my other works in progress Photosynthesis.jl and DynamicEnergyBudgets.jl)

But this means my data requirements are often a lot more fine grained spatially and temporally than worldclim style data ins SDMLayers. But both of those packages seem like they would good candidates for incorporation into a more abstract approach to raster data?

juliohm commented 5 years ago

The AbstractSpatialData type in GeoStatsBase.jl covers all spatial data types in the docs: https://juliohm.github.io/GeoStats.jl/stable/spatialdata/

I did put the T<:Real restriction there without thinking about Units. But we surely can extend the type to any T, and handle units more gracefully. I have to check out the Units packages in Julia first. I am not using them yet.

rafaqz commented 5 years ago

Ok great, that would probably be a good type to inherit from then, although I'll have to get my head around your package more.

(I like to add units wherever load NetCDF with known units to memory, so that everything is automatically correct after that)

mkborregaard commented 5 years ago

IMHO the priority should be to be compatible with / inherit from the JuliaGEO types. Your types are just compatible among your own packages @juliohm , right?

rafaqz commented 5 years ago

One issue for me is the "Stats" part of GeoStatsBase!

I rarely do statistical modelling, and tools for stats do complicate the package considerably in terms of contributing to it, and orienting the ecosystem of (non-statistical) tools I would be building around it. I was imagining something more like GeoBase, with no deps at all, just an abstract interface for basic spatial types.

rafaqz commented 5 years ago

Ok I still don't have my head around how JuliaGEO and all these packages relate!

Should we be aiming to dproduce some common basis for all (or as many as possible) of these use cases?

juliohm commented 5 years ago

@mkborregaard yes, most of the types there are only used by my own packages. I am moving forward with them because so far I couldn't find a solid base package that attends my needs. The data and domain types in GeoStats.jl however should be pretty general, and they don't have any "Stats" bits attached @rafaqz . Most of the stats-related types and funcionality is in GeoStatsDevTools.jl

I know that GeometryBase.jl is a place where some basic definitions are being addressed, but I couldn't find time to go through it carefully. Basically there is a lot of machinery that I still need to implement in GeoStats.jl. Only after that I can think of better integration with the ecosystem.

GeoStatsDevTools.jl for example already provides concepts such as neighborhoods, paths, partitions, etc. which are tricky to implement efficiently. I need types that support these types of operations.

evetion commented 5 years ago

I'm up for a dev package which defines a basic Raster type/interface. At the moment, my (renamed) https://github.com/evetion/GeoArrays.jl package is defined as follows:

mutable struct GeoArray{T<:Union{Real, Union{Missing, Real}}} <: AbstractArray{T, 3}
    A::AbstractArray{T, 3}
    f::AffineMap
    crs::AbstractString
end

This is the smallest type I could come up with (merging the nodata into missing, and cellsizes into the AffineMap.

In my opinion, an AffineMap is essential, as this can handle complex projections (rotations), whereas simple stepping cannot. It also maps directly to the GDAL ecosystem.

Regarding the projection string, I'm hoping we will have an PROJ6 package, that is Julia only, providing the database of projections. @visr is also interested.

mkborregaard commented 5 years ago

I see the usefulness of an AffineMap, but it is also an implementation aspect - what does it mean in terms of interface? The reason I don't have missing but use a sentinel value approach instead, is I want to avoid copying on-disk rasters. That should also be an implementation detail, the interface should not require anything but missing values being handled gracefully.

rafaqz commented 5 years ago

So to define that basic case as an abstract interface, we would need something like:

abstract AbstractGeoArray{T,N} <: AbstactArray{T,N} end

function projectiontype end

function projectionmap end

function coords end

function indexmissing end

etc

??

I'm not sure if we should agree on what N means? There are a lot of possible interpretations of what 3/4/5 dimensional data represents.

rafaqz commented 5 years ago

I agree on leaving Missing up to implementations, sometimes I specifically disallow missing in modelling applications as it can be much slower. (Edit: and you specifically can't use missing in dispersal simulations unless you want to output whole grid of missings. But I still want spatial metadata to pass through to the output array...)

mkborregaard commented 5 years ago

Yeah sounds good, and also essentially many of the functions I define here are part of a minimal interface I'd think, eg. getindex, extract, crop, mask, bbox and in. Maybe even aggregate though that's not as minimal?

rafaqz commented 5 years ago

It might be good to add some kind of abstract type hierarchy to deal with additional spatial and temporal dimensions, clarifying what N (in AbstractGeoArray{N,T}) means for any particular dataset in a standard way. Handling 3d and 4d datasets. I'm mostly using 4d microclimates so this is particularly important to me, sometimes 5d...

ClimateTools.jl deals with 3 or 4 dimensions like this, so N is 3 or 4. Time is always the last dimension, with the first two dimensions being long lat, and the third dimension is elevation for 4d:

  if ndims(data) == 3
    # Convert data to AxisArray
    dataOut = AxisArray(data, Axis{Symbol(lonname)}(lon_raw), Axis{Symbol(latname)}(lat_raw), Axis{:time}(timeV))
  elseif ndims(data) == 4 # this imply a 3D field (height component)
    # Get level vector
    plev = ds["plev"][:]#NetCDF.ncread(file, "plev")
    # Convert data to AxisArray
    dataOut = AxisArray(data, Axis{Symbol(lonname)}(lon_raw), Axis{Symbol(latname)}(lat_raw), Axis{:plev}(plev), Axis{:time}(timeV))
  else
    throw(error("load takes only 3D and 4D variables for the moment"))
end
mkborregaard commented 5 years ago

@juliohm just to be explicit I'm 100% in favour of integration across the julia ecosystem, and I think the ecosystem you're singlehandedly building is impressive. I'd just prefer this happened through shared compatibility with the JuliaGeo org. Does this make sense to you?

juliohm commented 5 years ago

Hi Michael, I fully agree. What about we set an online meeting with everyone to go over some of the base packages containing spatial types to see what are the most productive paths of development? I'm currently doing my research single-handed basically because I don't understand what is out there. Maybe it is an issue of documentation? I should also say that I'm not coming from a geography background but from a geostats background and that makes things a little bit more challenging for me to grasp.

I'll try to go over the JuliaGeo packages once again to find out the overlaps. It would be really interesting if we could gather people online to present slides or notebooks illustrating the concepts. Or maybe a general presentation containing all the infrastructure planned for the future.

On Mon, May 27, 2019, 12:27 Michael Krabbe Borregaard < notifications@github.com> wrote:

@juliohm https://github.com/juliohm just to be explicit I'm 100% in favour of integration across the julia ecosystem, and I think the ecosystem you're singlehandedly building is impressive. I'd just prefer this happened through shared compatibility with the JuliaGeo org. Does this make sense to you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mkborregaard/VerySimpleRasters.jl/issues/1?email_source=notifications&email_token=AAZQW3OGJIJKOMFE2SAX4FTPXP4UNA5CNFSM4GS55GS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWKCCMY#issuecomment-496247091, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQW3IENISCRGTJAE2CMLLPXP4UNANCNFSM4GS55GSQ .

tpoisot commented 5 years ago

Would love to attend the meeting - as @mkborregaard said, we are working on SimpleSDMLayers for rapid development of bioclim-type models, but there are a lot of niches to fill here. We can decide to work with compatible types, or decide on a common interface to implement for all types, or both. These are not trivial decisions, and the more people express their opinions, the better.

rafaqz commented 5 years ago

@juliohm I'm keen to attend such a meeting. It would be good to delay for 2 weeks so that there is time to learn enough about everyones packages, and write a simple API package to have as a discussion point? (I also need to finish my thesis this week)

@mkborregaard The number of use cases here makes me think we need a very simple package without an implementation, just for clarity. So starting a fresh package might be better.

rafaqz commented 5 years ago

Key inks from this thread to avoid scrolling around to find them:

Related discussion: https://github.com/JuliaGeo/GeoInterface.jl/issues/16 https://juliohm.github.io/GeoStats.jl/stable/spatialdata/

Related packages from this thread:

Biogeography/ecology https://github.com/mkborregaard/VerySimpleRasters.jl https://github.com/EcoJulia/SpatialEcology.jl https://github.com/rafaqz/Dispersal.j https://github.com/rafaqz/GrowthRates.jl (poorly named! its all loading/generating raster data) https://github.com/rafaqz/Microclimate.jl https://github.com/EcoJulia/SimpleSDMLayers.jl https://gitlab.com/tpoisot/BioClim

Climate https://github.com/Balinus/ClimateTools.jl/

Geo https://github.com/JuliaGeo/GeoInterface.jl https://github.com/JuliaGeo https://github.com/juliohm/GeoStatsBase.jl https://github.com/juliohm/GeoStats.jl https://github.com/evetion/GeoArrays.jl https://github.com/yeesian/ArchGDAL.jl https://github.com/Alexander-Barth/NCDatasets.jl

EDIT more related packages not mentioned here: https://github.com/JuliaGeo/Geodesy.jl/ (has datums) https://github.com/FugroRoames/CoordinateTransformations.jl https://github.com/FugroRoames/SpatialGrids.jl https://github.com/EcoJulia/EcoBase.jl https://github.com/JuliaGeo/Proj4.jl https://github.com/anowacki/BritishNationalGrid.jl

mkborregaard commented 5 years ago

So starting a fresh package might be better.

Sure, if you think so, we can always add stuff in from existing packages later

rafaqz commented 5 years ago

Heres a prototype base package for discussion:

https://github.com/rafaqz/GeoArrayBase.jl/blob/master/src/GeoArrayBase.jl

It might need to be called GeoBase as I incorporated non-array geo data types as in GeoStatsBase etc that should use the same methods.

As well as basic methods it has some definitions of array dimensions, crs projection formats and calendars that don't seem to be standardised anywhere. They probably belong in other Base packages.

juliohm commented 5 years ago

@rafaqz, the T<:Real restriction in GeoStatsBase.jl is gone now, so in theory one should be able to use Unitful.jl coordinates with the spatial objects defined therein.

Thanks for putting together a GeoArrayBase.jl proposal. I believe that we also need to concentrate on other spatial configurations in this GeoBase.jl package we are discussing. For example, structured grids, point collections, images, etc. These types are kind of what I have in GeoStatsBase.jl + GeoStatsDevTools.jl, and they serve my current needs. However, they lack the concept of projections, for example. Also, I did not have a chance to include the time dimension in any of those. I don't know if space and time should be mixed actually.

I did a major rewrite of the interface in GeoStatsBase.jl to facilitate some code reuse (e.g. plot recipes) and other planned features. In the spatialobject.jl file you can find a general interface for anything spatial (that fits in the concept of point-based measurements). From this abstract type, we can define at least two sub-types, which are still abstract: spatial domain, and spatial data.

The domain type is pretty much the same as a general spatial object type. It only materializes the idea of a collection of points with coordinates. It is basically a trick to use the spatial object interface, and still have a new type on which to dispatch.

The spatial data type is a domain plus some data. For example, for each point of the underlying domain, we can save properties. Most implementations in GeoStatsDevTools.jl save these properties in a dictionary mapping symbols to flat vectors of values. Note, however, that because we inherit from spatial object, and because we have domain types already implemented, all we need to do is wrap the domain in a new type containing the domain and an extra dictionary. Everything else we get for free regarding spatial queries.

This interface is working nicely so far within GeoStats.jl, and I'be happy to go over it in a call if you guys feel it is appropriate. My hope is that at some point these domain types that I am writing can be reviewed and extended to become a separate GeoBase.jl or GeoInterface.jl package.

As a final comment, notice that all these types that I wrote above are point-based. I don't have much experience with polygon-based or area-based spatial objects. This GeoInterface.jl package that we are discussing could encompass both classes of spatial data.

rafaqz commented 5 years ago

I have an update on this. I ended up putting together DimensionalData.jl as an abstract xarray-like indexing system that can be used to manipulate and plot tagged dimensional data, that suits this problem better than AxisArrays.

GeoData.jl inherits from that, but adds spatial metadata, and hopefully integrates with spatial data as you are describing @juliohm, Maybe we could start an issue there to discuss this further if this direction seems useful to you? I've looked at AbstractSpatialObject etc but there is still a lot I don't understand the reasoning for - its a big package.

There are some examples, one of which is this (g is a 3d GeoArray):

# plot the mean sea surface temperature around Australia from August to December 2002
dimranges = Time((DateTime360Day(2002, 08, 1), DateTime360Day(2002, 012, 30))), Lat(-45:0.5), Lon(110:160)
select(g, dimranges) |> x->mean(x; dims=Time) |> plot

Which plots ~correctly with axis labels and metadata even after select() and mean(). It would be great if select could accept a polygon as well.

@mkborregaard I'm not sure how much this covers what you wanted with VerySimpleRasters. The main missing piece is to integrate with NCDatasets, GDAL (or wrapper packages) for specific load/save/conversion. Which will probably require a lot more changes to GeoDims as well... There are lots of other small pieces unfinished as well, and feedback would be very useful.

@evetion Although this doesn't look like it will work with affine transforms, it will... But they are optional as a lot of the time we don't need them. AffineDims will just wrap any subset of the dimensions and hold the AffineMap. So a concrete type doesn't need an affine field, just dims and any combination will just work. That also means normal indexing just uses dimensions, but select() will use the affine transform when it exists.

juliohm commented 5 years ago

Thank you again @rafaqz for your efforts! I am still trying to understand to what extent the types proposed can be reused directly within GeoStats.jl. Could you please confirm that the types support non-regularly sampled points? For example, something that looks like this or simple point sets without array-like layouts.

Besides this non-regular types, I need to extend the existing types in GeoStatsBase.jl to introduce the notion of volume element. Some geostatistical methods can handle spatial measurements over different support. For example, it is common to have measurements of variable A in one resolution, and measurements of another variable B in a different resolution. In order to handle this scale difference explicitly in statistical models, we need to encode the volume of the measurement.

I am starting to think that we don't need types, but a comprehensive traits interface. It will be hard to conciliate all the proposed packages since everyone in JuliaGeo has different use cases.

mkborregaard commented 5 years ago

Hi I'm really sorry for not engaging more with this. I'm currently in Ecuador and will be travelling various places until mid-September - at which time I will absolutely love to engage fully with this again.

visr commented 5 years ago

@rafaqz awesome that you're exploring this space. Haven't had time to look into everything, but just briefly checked out DimensionalData. Have you considered using/working with the folks at https://github.com/invenia/NamedDims.jl and https://github.com/invenia/IndexedDims.jl? They have similar goals, and aim to be the next AxisArray (and are inspired by xarray).

rafaqz commented 5 years ago

@juliohm non-regular grids are doable. Dims could hold the matrices and handle dispatch so it would work on regular data types:

dims = Lat(latmatrix), Lon(lonmatrix)
a = GeoArray(data, dims)
select(a, Lat((1, 7)), Lon((1, 7)))

There would just need a few methods that dispatch on AbstractDimension{Matrix} to retrieve the indices with a splat.

But the dims with indexible integer lat/long also isn't the usual case... Maybe a wrapper type on the matrix could make thar clearer, Lat(Curvilinear(latmatrix)).

Otherwise the main change this approach would mean for GeoStatsBase is either adding/loosening a dims field to structs and using it to store dimension order and bounds, or just generating dims in the dims() method using your current fields and fixed dimension order. Also adding D for AbstractSpatialObject{T,N,D} <: AbstractGeoDataset{N,D} for dispatch.

The other thing is a rebuild() method so that dims can be updated after subsetting with select() or reduction of dimensions. That's how the above example still plots through the reduction and crop.

I'm not sure if you will see these changes as worthwhile! but separating out dims seems pretty powerful for interop to me - number and order of dimensions, grid spacing, coordinates and type (like DateTime etc), bounds, units, affine transforms, and possibly other metadata are all available from one method/field without changing anything else.

You can also just add a dims() method (without inheritance) and half of DimensionalData still works, but it would mean more code in the end.

@mkborregaard if I was travelling in Ecuador I wouldn't be engaging either :) But try the example when you get back the plotting is pretty smooth.