wkearn / RasterIO.jl

Simple Raster Formats for Julia
Other
18 stars 5 forks source link

WIP: use Raster type more #16

Closed visr closed 9 years ago

visr commented 9 years ago

A lot of the functions under the gdal folder work on GDALDatasetH. Since a Raster is essentially tied to this type, shouldn't we be using it more? At the same time I think we can get rid of some of the functions starting with _.

This is just a start with the getrasterband function, but if you like it I will go on with the rest.

cc @yeesian

yeesian commented 9 years ago

The functions in the gdal folder should be self-contained (relying only on clang folder) and not depend on the types defined in the src/raster_* files though. The idea is that the gdal folder might be pulled out of this package someday into a different package (maybe callled GDAL.jl), for packages like RasterIO.jl and possibly others (e.g. OGR.jl) to depend on.

yeesian commented 9 years ago

But your point is well-taken: I had punted on introducing a RasterBand type, for cases where the user might need to work with the individual GDALRasterBandH directly. Am waiting for people to use the package to see what use cases arise.

visr commented 9 years ago

Hmm I see your point. The only dependency in the src/raster_* files is the Raster type though. I suppose a possible GDAL.jl would define the Raster type instead?

Shall I leave this be for the time being then?

yeesian commented 9 years ago

I have also been thinking about defining types like

immutable Dataset
   ptr:: GDALDatasetH
end

immutable RasterBand
   ptr:: GDALRasterBandH
end

in src/gdal/types, because they're currently aliases for Ptr{Void} when they really ought to be distinct.

But the GDAL types have their own class hierarchy, and functions that accept GDALMajorObjectH should also accept pointers to subclasses, i.e. we might have to do something like

abstract MajorObject

type Dataset <: MajorObject
   ptr:: GDALDatasetH
end

type RasterBand <: MajorObject
   ptr:: GDALRasterBandH
end
yeesian commented 9 years ago

At the same time I think we can get rid of some of the functions starting with _.

They are the only functions which are well-documented (based on the gdal c/c++ api) at the moment though. I haven't yet documented the convenience functions.

visr commented 9 years ago

They are the only functions which are well-documented (based on the gdal c api) at the moment though.

But in most cases (like in the current commit), this documentation can just be transferred to the remaining function right, and adapted if need be?

Regarding the types: I agree it would be good to replace the Ptr{Void}. And I guess it could be helpful to mimick the GDAL class hierarchy. But I agree it's not a straightforward topic. Since types are so important in Julia, it's probably helpful to think this through properly (or experiment). It could be for instance that under gdal/ we replicate the GDAL class hierarchy directly, and under JuliaGeo in general we use our own types like Raster and Feature and Projection etc.

yeesian commented 9 years ago

It could be for instance that under gdal/ we replicate the GDAL class hierarchy directly, and under JuliaGeo in general we use our own types like Raster and Feature and Projection etc.

Indeed. I'm not sure it's worth the complexity at this point, and think we'll get more mileage out of working on the rest of the TODO list for now.

It might be worth spending some time to understand why mapbox/rasterio and fiona seem to be more widely used than their osgeo counterparts.

visr commented 9 years ago

It might be worth spending some time to understand why mapbox/rasterio and fiona seem to be more widely used than their osgeo counterparts.

I believe it's because they go further than thin wrapping GDAL, and focus on providing a convenient, pythonic API. We should probably do the same, and separate a thin wrap of all of GDAL into a separate package. I think the TODO's are mostly still in the thin wrap category. Once this is in place, we can use it to build a convenient julian package.

yeesian commented 9 years ago

Yup, agreed.

meggart commented 9 years ago

So what would you suggest right now? Would it be feasible to separate a GDAL and a RASTER package early, before they start to get mangled? If we plan to do so anyway, why not now?

Regarding API, I was trying to use the package for real work last week. First thing I thought was that GDAL can read very distinct types of Rasters:

  1. Images of some kind, whose bands have a color interpretation
  2. data rasters of a physical quantity, single band, not representing a color (like albedo maps...)
  3. data stacks of variables, with several bands where each band represents e.g. a time step or altitude

I think all of those would deserve their own Raster type in an API, at least it would be good to automatically detect and separate (1) from (2) and (3).

yeesian commented 9 years ago

Would it be feasible to separate a GDAL and a RASTER package early, before they start to get mangled? If we plan to do so anyway, why not now?

Sure, it was low(er) priority for me, until someone actually rolls up their sleeve to work on OGR.jl, at which point the separation makes sense. Just needs someone to do the separation. I was thinking something like

GDAL.jl (light weight wrapper that does the building of GDAL, and update/release/tag whenever the underlying GDAL library has a new release or sth)
RasterIO.jl (requires GDAL.jl)
OGR.jl (requires GDAL.jl)

That way you can create geometries in OGR.jl/etc that still play nice with RasterIO.jl (e.g. "burning"/rasterizing OGR geometries into rasters)

meggart commented 9 years ago

Ok, I understand. I think I can't work on OGR.jl right now, so let's keep stuff together right now and work on the TODOs for the thin wrapper.

yeesian commented 9 years ago

First thing I thought was that GDAL can read very distinct types of Rasters:

  1. Images of some kind, whose bands have a color interpretation
  2. data rasters of a physical quantity, single band, not representing a color (like albedo maps...)
  3. data stacks of variables, with several bands where each band represents e.g. a time step or altitude

Agreed. I don't know what the corresponding raster types should be though. Currently we just treat them as arrays of values, and do the interpretation in Images.jl (see the section on "Overlays" in https://github.com/wkearn/RasterIO.jl/blob/master/doc/RasterioGuide.ipynb) It seems to work okay (e.g. when processing satellite images, like in https://www.mapbox.com/guides/processing-satellite-imagery/). My mental model is "RasterIO = GDAL, Images = ImageMagick".

If there's a list of common specifications of rasters that people work with, we can provide types/methods for them.

visr commented 9 years ago

Ok I'm experimenting a bit with the GDAL.jl you mentioned, see this: https://github.com/visr/GDAL.jl/blob/master/gen/wrap_gdal.jl

This wraps GDAL 2.0.1, currently only gdal.h and ogr_api.h, but will add whatever we need. Interestingly they will also be adding a utils API in the future 2.1 version, such that there is no need to shell out to use these functionalities.

@meggart Does the wrap_gdal.jl look similar to what you used? Perhaps we can use the rewrite(Expr) callback from Clang.jl to rewrite some things for us. (As done here for instance.)

yeesian commented 9 years ago

Nice work @visr! I have seen the RFC for the utils API too, which is great.