Test generic raster support w/ Spire

lossyrob commented 7 years ago

@fosskers this is your jam now ;)

fosskers commented 7 years ago

Yeeeaaaaahhhhhhhhhhh :sunglasses: (music from CSI)

echeipesh commented 7 years ago

Related and additional discussion here: https://github.com/locationtech/geotrellis/issues/1789

metasim commented 7 years ago

Awwww snap, the 🐉 rises!

fosskers commented 7 years ago

The first thing to do would be to use JHM to compare our IntArrayTile subtype with a Tile[Int] and see how they compare. Doing so on Scala 2.12 might be best, even. We don't yet publish 2.12 artifacts because of Spark compat, but raster on its own should be able to be publishLocal'd with 2.12.

The Tile[A] should be marked @specialized for the usual number types and use whatever help spire can offer.

Assuming Tile[A] would gain us something (the dismantling of the Tile hierarchy, no more map/mapDouble, etc), @lossyrob @echeipesh is there any percentage of slowdown that would be acceptable as the cost of that abstraction? In a perfect world all this @specialize and newtypeing will avoid boxing altogether, but who knows what'll actually happen. 5%? Unlikely to be that good. 50%? Way too slow, not worth it. 10%? Nice, still probably not likely. 25%? Users might be sad. 15%? Just right?

metasim commented 6 years ago

I'd also be interested in understanding how much of a performance benefit arose from the macro-generated aspects of GeoTrellis, and whether this is or isn't a requirement for optimal performance.

lossyrob commented 6 years ago

The thing the macros get around is the fact that FunctionN where N > 2 is not specialized. The macro generated methods prevent boxing while allowing an API that still allows for lambdas over things like map and mapDouble.

There was a lot of benchmarking to make sure this was the case.

lossyrob commented 6 years ago

@fosskers because we're already hamstringed with being on the slow JVM, and we are a performance oriented library, we've from the beginning done a lot of work to eek out the most performance possible. So to me, 15% slowdown is unacceptable. Even 5% would hurt. I spent hours upon hours microbenchmarking and tweaking focal operations when I refactored them to make sure there wasn't any slowdown at all from a previous version. Tile performance remains to me a very core concern of GeoTrellis.

The thing is here, if you end up boxing, you're not going to see some small percentage difference - you are going to see a ton of slowdown. So I think it's an all or nothing thing - if you figure out how to do it and completely avoid boxing, I don't see why you would have to slow things down at all.

fosskers commented 6 years ago

Tile performance remains to me a very core concern of GeoTrellis.

Gotcha, thanks for being open about those priorities.

fosskers commented 6 years ago

I can't seem to find the benchmarks in question. The geotrellis-benchmark project looks mostly empty now since many of the benchmarks stopped compiling.

lossyrob commented 6 years ago

Which benchmarks? The macro ones, I'm not sure they survived forward. Happy to have people double check and make some ones with more longevity if someone is up for it. But if the macros don't speed things up in a .map { (col, row, z) => ??? } case, I will eat all the hats :)

The focal benchmarking also didn't survive. But I found it here: https://github.com/locationtech/geotrellis/blob/_old/v0.8.0/benchmark/src/main/scala/geotrellis/benchmark/FocalBenchmarks.scala

While perusing benchmarks found this: https://github.com/geotrellis/geotrellis-benchmark/blob/master/geotrellis-0.10/src/test/scala/geotrellis/raster/GenericRaster.scala which is a good indicator of boxing slowdown.

fosskers commented 6 years ago

Awesome, we should revive those into the bench subpackage here.

Oh cool, and GRaster is probably a good thing to test @specialize against.

metasim commented 6 years ago

For the reference bin: https://github.com/alexknvl/newtypes

metasim commented 6 years ago

Feel free to create an issue and assign it to me if you want specific benchmarks ported (just not all of them at once).

pomadchin commented 6 years ago

@metasim cool link, hope we'll see Spark on 2.12 at the end of this year.

metasim commented 6 years ago

For the crazy idea bin:

What if Tiles were encoded as ND4J matrices, and Map Algebra ops were rewritten using its native operations, which can be executed in CPUs or GPUs, via the BLAS/LAPACK backend of your choice.

See also:

fosskers commented 6 years ago

What's the difference between the last two lines?

2017-10-13-115525_1918x1078_scrot

lossyrob commented 6 years ago

@metasim the ND4J idea is a good one. Some benchmarks around that be really interesting. I'm often asked if GeoTrellis can take advantage of GPUs; using LAPACK has always been in the back of my mind for that usage. If it turns out that ND4J beats our ArrayTiles hands down, and we can make a deployment story that isn't too painful, I'd say we should start thinking about putting a migration on the roadmap.

metasim commented 6 years ago

Alternative to https://github.com/alexknvl/newtypes with better ergonomics (IMHO):

https://github.com/fthomas/refined

fosskers commented 6 years ago

Cool, that's not one I've tried yet. I'll give it a spin next Monday.

fosskers commented 6 years ago

From refined:

Using refined's macros for compile-time refinement has zero runtime overhead for reference types and only causes boxing for value types.

fosskers commented 6 years ago

https://github.com/locationtech/geotrellis/pull/2520

^ This PR achieves "Tile[T] with caveats", as explained in that PR.

locationtech / geotrellis

Test generic raster support w/ Spire #39