Closed Datseris closed 1 month ago
It's not really supported yet, I actually commented it out: https://github.com/rafaqz/DimensionalData.jl/blob/master/src/grid.jl#L338-L356
Edit: and that was for datasets still in matrix style, different to what you describe, sorry I misunderstood you a little here. But we can probably support that type of data too eventually - I called this DimensionalData instead of DimensionalArrays for a reason ;)
The rest probably isn't relevent but I'll leave it here...
But it needs to be supported for this to be widely used for geospatial data. And I haven't figured out Near
etc either...
You can use an affine transform: https://github.com/rafaqz/DimensionalData.jl/blob/master/src/grid.jl#L327
I have written it a little differently to what you describe - to keep the one axis one dim symmetry of the package (things will break otherwise). The idea is Lat and Lon hold the same matrix (one could actually be empty but it seems odd) and we just combine dims2indices
for them - you always have to use both.
Nearly all the code to do this is written - affine transforms need a very similar setup, and they are both <: DependenGrid
. It's just a matter of finishing it and testing it. Some of the recursive primitive methods are pretty wild...
https://github.com/rafaqz/DimensionalData.jl/blob/master/src/primitives.jl#L79-L97
Hm, I didn't understand most of the things you said. :( Especially this part:
I have written it a little differently to what you describe - to keep the one axis one dim symmetry of the package (things will break otherwise). The idea is Lat and Lon hold the same matrix (one could actually be empty but it seems odd) and we just combine dims2indices for them - you always have to use both.
Here is how I see the functionality I would like to have and I could actually put it in a PR:
One can define a dimension Coordinates
. This dimension takes as value a Vector{SVector}
.
This dimension works for naming and dimension specification as before, i.e. one can choose slices by the dimension name, one can use it in mean as mean(A; dims = Coordinates)
, etc.
Then, one can easily implement Near
and At
. In fact, I think there should be no difference besides taking care that the code measures distances in a generic way from Distances.jl instead of doing something like abs(a - b)
.
For Between
it is also easy: between takes both bounds to be a static array Between(SVector(1,1), SVector(2,2))
and matches all coordinates with first entry between the first entries of these two vectors, and second entry between the second entries of these coordinates.
I don't know but for some reason I feel that this sounds like something very straight forward both implementation-wise but also to explain to people. I didn't get the part about the affine transforms that you mentioned either (in fact I don't know what is an affine transform). I also don't understand why "transforms" would be at all mentioned in this dicussion. Why would you do any transformation on the coordinate of the points? This doesn't feel to me like something that should happen in a package about dimensional data...? But maybe I totally misunderstood what you mean.
However what I propose over here would work with any unstructured grid. I am not sure what kind of grid you have in mind, but people typically use triagnulations as well, e.g. :
What do you think w.r.t. what I say above? How does it fit with what you said ?
TransformedGrid
is for when the array grid is rotated (or some other affine transformation) from a more standard grid like (Lat, Lon) - so instead of using a lookup index, you apply a coordinate transform function to both the Lat(Near(x))
and Lon(Near(x))
values that gives you the rotated index values. The thing in common with what you were saying is that you need two dim values to index, not one. In the commons spatial use-case each point needs to be specified with a (Lat, Lon)
pair.
But in that case the data is at least 2 dimensional. So it has to have 2 dims for consistency, instead of the combined Cordinate
dim you discuss (I suggest Coords
or Co
for brevity).
But your use case is also interesting. It's 1 dimensional data, so it can only have 1 dim anyway. That works. The selectors should also work, which is nice. But it's also slightly breaking a common pattern of this package. The order or the selector dimensions are no longer arbitrary, so you have to know that lat is first, lon second in the SVector
(or whatever they are). That could be fixed if Coordinates
itself having a dims field with empty dims just for order, or by having that on the grid.
Then you could do Between(X(1, 2), Y(1, 2))
if you had to. But maybe it's overkill, I'm not totally sure of the use case.
Put a PR together anyway, I think it makes sense.
Okay, I finally understand. I am still not entirely convinced whether these coordiante transforms really belong here instead of GeoData.jl, but I guess that is minor. If these transform things are documented well I guess its okay (this is the first time I hear about this btw).
Cool I'll do a PR soon then. I have started using this in my project already, seems like it works so far. I'll wait for advances in abstract dim supertypes to be merged first though.
Ok sounds great. would it be possible to add a plot recipe for them?
I'd be very happy if you could did this instead. I have never used Plots.jl and use exclusively PyPlot.jl. I don't know anything about recipes... When I do the PR I can tell you how I plot these fields: I do a scatter plot.
Doing some kind of surface plot would be possible but only if one keeps the bounds of the grid points. At least I don't do this because they are so expensive... 4x times more data to save.
Ok I can write it, but I have no idea how you usually plot them. Maybe post some images or a scatter and surface plot here.
And why do you need the bounds for every point? Why not bounds for the whole area?
Also will this work if you also have a time dimension?
This is how it looks like with PyPlot. I don't know if Plots have a Geo-projections backend. PyPlot and Makie have one though. If you want to turn this into a surface plot you need vertices for all polygons that each point in the plot actually corresponds to.
Also will this work if you also have a time dimension?
Well you can't plot three-dimensional space like this, but you can animate it I guess? My data already have a time dimension but when I am plotting I either do time average, or other processes that collapse the time dimension, or make video versus time. But I use Makie for that, everything else is too slow in comparison.
Maybe it is worth doing a mental separation between providing plotting functionality and providing numeric functionality. Whether some functionality gets in here should in principle be detached from wether it can be plotted. After all it is my impression that the main goal of this package is numeric functionality, and not plotting functionality (which seems to be getting heavy investment, even though not everyone uses Plots - Makie is another alternative, which I honestly believe can/will replace Plots entirely in the near future).
My question about the time dim is mostly numeric. Im just wondering how this fits in the package and mixes with other use cases.
I've thought about this a bit more. It's a good idea, but we should try to make it as general as possible.
The Coord
dim can hold an arbitrary number of coords (alowing 3d coordinates), and have it's own dims field so the coordinate number and order is explicit.
This will mean we can convert from any other DimensionalArray to a coord vector but keep all the information. It will also useful for other things like selecting points with a polygon.
This would give flexiblilty to subset with a polygon and covert e.g. X and Y dimensions of a 3d Array to coords, but leave the time dimension unaltered as a separate dimension. So end up with a 2d Array with (Coord, Ti)
dims. Or whatever.
We have kind of had this for years not, in MergedLookup
.
I will close this and future issues can be related to problems and additions to what we already have. Like plotting MergedLookup
probably doesn't work.
Hi,
I remember in some issue discussion that in general dimensions could be subtype of
AbstractArray
instead ofAbstractVector
to support non-uniform grids.I am trying to get my head around how to use this here or even more so how to use
Near
for such grids.I now have an unstructured grid (equal-area Gaussian) and my lons and lats are actually just a very long vector while my "data" is also a very long vector. I.e. at every point at lon lat
lon[i], lat[i]
I have the valueA[i]
.If using such datasets is supported here, w should somehow document this somewhere with some example of how to best use them and how to best define the dimensional array with this format. Is the best option for me to make my
A
have two dimensions:LonLat, Time
, where theLonLat
dimension is actually a Nx2 matrix? Then I guess usingNear
could be exlained easily?