Open asinghvi17 opened 3 weeks ago
With tiled batches we can just keep filling the same small ector of points that we pass to Proj to tranform. We can swap over resample
at the same time. Its just historical that GDAL was already available as a hard dep so I used it. But its not anymore, and we should switch to Proj.
(also MWE or at least the size of ras2
would help your benchmark)
Added MWE code. General tiled iteration would also be nice to have, maybe from DD - this way you can iterate over the most optimal pattern for access as well, which could potentially play well with Dagger. I'm sure DD already does this but it would be nice to have it more exposed.
Actually we just let DiskArrays handle that currently
The reason I wrote it so that every point is transformed 4 times is that cell boundaries don't necessarily match. We can write a more optimized version but we need to keep something as stupid as the current implementation around for those cases.
Ah right we need a version for Irregular and Regular
Ah right we need a version for Irregular and Regular
Basically, though technically it could be possible to have overlap or gaps even if a dimension is Regular, right?
No regular is by definition gapless, there is no way to represent gaps. Actually only Explicit lookups have gaps and not Irregular, Irregular cant represent gaps either
Switching ArchGDAL out with Proj and using tuples instead of vectors already is a bit of an improvement:
julia> @be Rasters.cellarea(Spherical(), ras2) seconds=30
Benchmark: 4 samples with 1 evaluation
8.519 s (49 allocs: 42.481 MiB)
8.876 s (61 allocs: 42.481 MiB, 0.26% gc time)
9.016 s (49 allocs: 42.481 MiB)
9.396 s (56 allocs: 42.481 MiB, 0.20% gc time)
Profview shows that roughly 50% of the time is spent on the transformation and 50% on converting lonlat to cartesian coordinates.
I also found out that proj can convert directly to cartesian coordinates, though it seems to be much slower than converting to lat-lon and then to cartesian in Julia. https://proj.org/en/9.3/operations/conversions/cart.html
Proj will likely be less accurate than Julia for that transformation, FYI.
It was also slower, so the way it was before was probably fine.
I think switching from GDAL to Proj is easy enough. It's silly to transform every point 4 times but can't think of an elegant end generic solution to do the tiled iteration right now. I guess both of you have implemented that kind of thing a hundred times?
Also, this is not the kind of function that would end up inside an inner loop. I'm not sure how necessary it is to spend a lot of effort on making it fast.
Its probably already the fastest that exists by an order of magnitude, so I'm happy to call it off wherever. But generally switching from GDAL to Proj for these things is better for a bunch of reasons, not just speed.
If we're breaking, then we may as well break everything, and switch both reproject and cellarea to Proj instead of ArchGDAL? Since none of them need gdalwarp. Maybe also mapped crs things.
A generic tiled iterator would be a bit tough. I think you'd indicate an approximate size, and the iterator could partition into tiles that fit the chunk pattern most optimally. But that's a bigger change...I think even switching to Proj will give us enough speedup for my purposes.
~Ah crap, converting cellarea to Proj means we have to implement some form of GFT conversion in Proj as well...~
Edit: Proj has both GFT conversions and an inbuilt geographic check. So we could just switch.
https://github.com/meggart/DiskArrayEngine.jl/tree/main kind of implements tiled iteration, but it's also huge. I don't think it offers multithreading by default either, and it pulls in a crazy number of dependencies.
We could talk to Fabian about maybe splitting parts of it out into a DiskArrayEngineBase.jl package? But it's a bit complicated. Probably easier to write a lightweight wrapper on our own and maybe take some concepts / struct from DiskArrayEngine...
I just realized that Proj allows you to return data in radians (which is what it uses internally). We could potentially switch to that and avoid cosd and sind. Might speed up projection speed by a hair as well.
The switch is +units=r
in a Proj-string, but if it's safer we can always use a const RADIAN_WGS84 = GFT.ESRIWellKnownText(...)
.
Good idea, minimise conversions where possible
768 already does a lot! But there are still a few low hanging fruit.
For reference,