Closed fosskers closed 4 years ago
There's a code example there for imports which didn't seem to survive the conversion from .org
. Here is what I intended:
/* To the point. Python/Cats/Circe/Haskell-like. Says "gimme it all, I trust you." */
import geotrellis._
or
/* True to the package name. */
import geotrellis.core._
I raise my hand for geotrellis-core
over geotrellis-types
, as it leaves the cognitive door open for content that aren't just ADTs or are are commonly used pure functions.
All typeclass instances for those types should also live in geotrellis-core
. In GT's case this mainly means the JSON codecs.
Crap, speaking of which, I forgot to mention JSON. Maybe Circe migration should be 3.0?
@metasim I agree that core
meaningfully implies more than types
. I'm least enthused about common
, which doesn't really have much semantic value other than "I'm a grab-bag package".
Avro codecs too?
Boundable Bounds EmptyBounds KeyBounds LayerId Metadata SpatialKey SpaceTimeKey TemporalKey TileLayerMetadata LayoutDefinition and friends
All this has to do with tiling and IO. Seems more like this should be broken out into an core io
module, or something that has to do with tiling.
All CRSs? (from proj4)
We already have a subproject that has all the CRS's - proj4. There's also an issue to handle other projection libraries (https://github.com/locationtech/geotrellis/issues/1294). I was imagining the refactor here being we have a geotrellis.crs
library, that can detect if another library is present and use it (e.g. using proj4j, proj.4 bindings or Apache SIS).
All KeyIndex types and indexing functionality
When we move to supporting SFCurve, this will carry an additional dependency, and be worth a project in it's own right (e.g. geotrellis-indexing)
Extent
This is a vector type, I think it belongs in geotrellis-vector.
If we carry through all the JSON and Avro, we adding a lot of dependencies to things that don't need it (e.g. raster doesn't need avro, and people user raster on it's own).
I think some sort of different breakdown is definitely needed. I'm not sure this core
vs non-core
representation is it.
Seems more like this should be broken out into an core io module
A quote from @echeipesh from the original May conversation:
Ok, I think there is a set of functionality, working with tiled layouts. That is a common pattern for what we use, but it doesn’t actually have a lot to do with raster.
I agree that those types have a lot to do with tiling, but I'm not seeing the connection to IO
. Surely those things are used by our IO backends (s3
, etc), but the types themselves are GIS concepts that can be used in pure code that doesn't involve spark or the idea of data passing through a program. Related to that, it's always struct me as odd that our GeoJSON typeclass instances live in vector.io
. Converting between JSON and data types in and of itself has nothing to do with IO, unless we're not sharing the same definition of IO?
We already have a subproject that has all the CRS's - proj4.
lazy val proj4 = project
.settings(commonSettings)
.settings(javacOptions ++= Seq("-encoding", "UTF-8"))
Cool, looks like proj4
is at the bottom of the dep graph too. Might not be a problem for it to stay that way, which means that geotrellis-core
might even be able to depend on it.
When we move to supporting SFCurve, this will carry an additional dependency, and be worth a project in it's own right (e.g. geotrellis-indexing)
:+1:
[Extent] is a vector type, I think it belongs in geotrellis-vector.
@echeipesh said the same thing, but it's damn central to tiling work. I realize that there's a close relationship between Extent
and Polygon
, but Extent
by itself isn't unique to working with geometries. I'd say it's as fundamental to any GeoTrellis work as SpatialKey
is.
Furthermore, if Extent
doesn't live in core
then neither can LayoutDefinition
, and then we quickly see the point of a core unravelling if half the things that should be in there can't be.
Overall agree with need to simplify the import burden for using GeoTrellis.
-core
vs -tiling
breakoutOtherwise the main drivers for the class shuffle have been the need to use Collections API without Spark dependency. In general this leads down the road of working with tiled rasters somewhere other than geotrellis-spark
. The includes
All of these things can live pretty happily in geotrellis-raster
but that does scope their re-use somewhat. In fact there is a lot of functionality that has to do with laying out either pixels or tiles over map surface and a lot of the classes @fosskers mentioned are the bottom of that class hierarchy. With exception of LayerId
none of them have anything specifically to do with IO, thats just the place where they are most prominently used. The gut feeling is that it makes sense to home those in geotrellis-core
rather than trying to find a very specific package name.
Another pain felt is the lack of a good home for Avro/JSON codecs. The pain compounds when you consider library choices. Migrating to circe
would be nice but cutting people loose from spray-json
could also be problematic as it would force to all downstream applications to deal with swapping in circe which as we saw with Raster Foundry comes with its own gotchas. Similar story goes for Avro if you consider that ProtoBuf is at least one more valid choice.
The answer hopefully is to separate the GeoTrellis interfaces for specific library kind of like its done with Component
, abstraction over lenses, and then break out the Avro and spray-json into their own packages.
Right, duh, the collections api is a much stronger motivation for geotrellis-core
than my SpatialKey
woes. ;p
Re: where should json instances / avro codecs go?
We implement typeclasses in Scala with implicits. These are sometimes implicit class
wrappings, and sometimes implicit val
s which provide instances of classes needed by some external library functionality (like Encoder
from Circe). If these implicits are defined anywhere but the companion object of the type they're tied to, they become "orphans".
Said another way, using Encoder
as an example: if it is possible to import a type A
while not importing its Encoder
instance, then that instance is an orphan.
Claim: Orphans increase user cognitive overhead, specifically regarding imports.
Example from our FAQ regarding .toGeoJson
.
Defining orphans leads to compiler warnings in languages with first-class typeclasses, so in Scala it's just something we have to be proactive about.
That said, does anyone foresee problems with migrating json / avro codecs to the companion objects of their types? This is how Circe instance auto-generation works - the instances are written into companion objects. The library breakout approach suggested by @echeipesh in his Codecs section above has value to be sure, although the fact remains that the instances themselves would be orphans.
As an alternative to stuffing would-be orphans into companions, we could have spray and circe alternative imports and stuff all required codecs into a single import object. Barring that, maybe we could also wrap the spray encoders/decoders we have with circe (did this a bit in RFML) to give nice operation within a circe
application
I just hit the orphan instance issue again. This won't compile:
import geotrellis.raster._
import geotrellis.spark._
import geotrellis.spark.io.s3._
import org.apache.spark._
// --- //
object Pipeline extends App {
override def main(args: Array[String]): Unit = {
implicit val sc: SparkContext = new SparkContext(
new SparkConf().setMaster("local[*]").setAppName("pipeline-test")
)
val reader = S3LayerReader("geotrellis-test", "colingw/pipeline")
val layer = LayerId("mask", 14)
val rdd = reader.read[SpatialKey, Tile, TileLayerMetadata[SpatialKey]](layer)
println(s"COUNT: ${rdd.count}")
sc.stop()
println("done")
}
}
saying
could not find implicit value for evidence parameter of type spray.json.JsonFormat[geotrellis.spark.SpatialKey]
So having SpatialKey
in scope is not enough to read a layer that uses them. Trial and error reveals this:
import geotrellis.spark.io._
to be the necessary extra import.
An idea from @lossyrob yesterday, for some future version of GeoTrellis. Paraphrased:
We shouldn't wrap the JTS geometry types ourselves. All our functions should work on JTS geoms directly, with extra functionality added by injected methods only when necessary.
^ Although it occurs to me now that this would force the user to import (and depend on!) JTS themselves when they want to reference the Geom types in their code's type signatures.
Workaround until Scala supports re-exports?
import blah.blah.jts
package object vector {
type Point = jts.Point
... // same for other geom types
}
@fosskers From where I stand I agree wholeheartedly with @lossyrob and really like the idea of using type aliases to simplify imports. Sounds like a winning approach to me.
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Table of Contents
geotrellis-core
Recent work with spin-off libraries, Raster Foundry, and GeoTrellis itself have made us aware that we're nearing a "major version precipice". We've been holding back a few PRs, knowing that they're necessary yet API breaking. And we've found a few quirks with the project structure, the canonical one being:
So, if we need to tighten our overalls, hit the keyboard, and break the API, what's the best way to go about it? How can we get the most bang for our buck if we're going to be incrementing our major version? This doc serves as a launching point for that discussion and subsequent work.
1
geotrellis-core
@echeipesh and I had a conversation in early May when he was trying to find a home for a sum type which didn't trivially belong to either
vector
,raster
, orspark
. We thought it might be time to revive the idea of a separategeotrellis-core
(or likewise named) package which would always be at the bottom of any dependency graph involving geotrellis.More recently while working on a PR involving MapKeyTransform, certain intuitive seeming transformation methods weren't possible to write given the way that the various GT libraries depend on each other, for example:
A contributor @metasim has chimed in with his support for this idea.
In general, a
geotrellis-core
package would be a place for the most central types to live, regardless if the user is interested in vectors, rasters, vectortiles, or Spark.1.1 Naming
Suggestions:
geotrellis-core
(Cats/ScalaZ/Circe/Monocle convention)geotrellis-types
(Haskell convention)geotrellis-common
(Java convention)1.2 Importing
1.3 Contents
I did a quick scan of the Scaladocs during my initial conversation with Eugene. I suggested the following be moved to a central package:
I'm likely missing some. If you notice any, please let me know and I'll update this list. Anything that can realistically be used regardless of vector/raster/spark could be here.
1.4 Migration
If we adopt
import geotrellis._
, then the migration guide may amount to:2 Discoverability
This is an effort to address one of our team goals for the year - improving "discoverability" in GeoTrellis.
We often get questions like the following from both within Azavea and without:
or more frustrating:
or the most frustrating:
We can take a lesson from a library whose functionality is hard to discover and has nightmarish imports: ScalaZ.
A suggested definition of discoverable:
Assumptions:
2.1 Reducing duplication
A number of our classes have heavily overloaded methods, where the inputs vary only slightly. Many of these overloads perform simple
A -> B
pure transformations before eventually calling down to the "real" form of the method. Here isColorRamp.toColorMap
as an example. Which is the "real" one?Sets of overloads like these can make the call tree hard to follow for developers. Since they're also often undocumented, the question can arise in a user's mind, "what's the difference?" and they're forced to guess or ask us.
If these assumptions are true, would it be prudent to reduce the number of such overloads, putting the burden of the
A -> B
transformation onto the user? Exceptions can be made of course, one coming to mind being one of the ways to construct a Polygon:Having to prefix each point with a
Point
constructor would get verbose. Then again, how often do we manually create Polygons like this, outside of tests?Here are a few candidates for reducing duplicates.
PngRenderMethods.renderPng
MapKeyTransform
methodsColorRamp.toColorMap
2.2 Avoiding method injection where deemed unnecessary
We have a vast, undocumented mechanism for method injection involving 411 classes/traits. They follow the naming pattern
*Methods
andwith*Methods
. They can be considered a source of confusion, as they add a layer of indirection in determining the location of method definitions. This leads to confusion with imports, and slows down development.A number of these sets of methods (likely not all) could live directly in the class they operate on. This would reduce a huge swath of symbols that appear in Scala docs, and would likely decrease compile time.
Examples:
ColorMethods
3 Easier SemVer iteration
If every symbol available in GeoTrellis is visible to the user through imports or documentation, then every symbol is "user facing", and every change has the potential to be API breaking. This makes it particularly hard to make sound calls regarding our versioning.
If we make a sweep of GT and draw a clear line in the sand between what is user-facing and what isn't, then whatever isn't can be hidden via
private
or stuffed in aninternal
subpackage (a Haskell convention that works well that I use in both VT and VP).This should make it easier to do heavier changes without causing API breaks. This would allow us to increment versions easier and keep a regular release cycle. This also has the benefit of reducing API surface area, since many symbols will be made hidden from the Scaladocs.
@metasim