locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.34k stars 360 forks source link

ValueReaders Should Return Option[K] #2732

Open jbouffard opened 6 years ago

jbouffard commented 6 years ago

Right now, whenever a ValueReader fails to read a given Tile it throws an error. However, this is different from what LayerReaders do which is just to return an empty layer if the given query returned nothing. We should change the return type of ValueReader.read to Option[K] so that it doesn't throw when it can't find the Tile.

metasim commented 6 years ago

FWIW, here are my thoughts about error handling for multi-stage parsing or processing to derive a desired value.

TL;DR: Consider Validation.

My current viewpoint:

1) Checked exceptions suck 2) Undeclared unchecked exceptions are just malicious 3) Option[T] is lossy 4) Either[T, Throwable] can be constraining (no error accumulation), and assumes non-parallel validation 4a) Either[T, Seq[Throwable]] or Either[T, Nel[Throwable]] provides accumulation, but you have to work out the compositional semantics yourself. 5) Either[Throwable, T] points out the problem with Either... I can't remember the convention of which side is the "good" side. 6) Try[T] is a bit better (can be transformed into an Option or Either), but frankly I find the default combinators hard to work with (e.g. recover vs recoverWith vs fold...). I still use it quite a lot though. 7) Validation or ValidationNel from scalaz/cats are probably solving the problem "correctly", but every time I start to use them I feel like they are over engineering the problem in the name of some abstract "purity". And they need to be used consistently to have real value.

What I do, vs what I should do:

In summary, if I were writing something new, or willing to commit to reworking all the code to be consistent, some form of Validation is probably the right answer. But consistency is key if you want to get the benefits of composition, otherwise you're increasing cognitive surface area with no benefit. Otherwise, I'd go with Try.

pomadchin commented 6 years ago

Thanks @metasim for your quite expanded response. I'm thinking more towards MonadError or smth like that, in order not to be locked on default combinators.

jbouffard commented 6 years ago

Thank you, @metasim you raised some good points. @pomadchin Would we want to implement something like MonadBlunder from this article? I'm not familiar enough with cats to know if they have something like this now, though. What are your thoughts @moradology @jpolchlo @jamesmcclain ?

moradology commented 6 years ago

I think there's another option so far not considered (save obliquely in the context of MonadError): cats.effect.IO. IO would allow us to completely defer the question of returning Either[Throwable, Result] vs throwing. For example:

val maybeExceptiony: IO[Int] = IO { myIntReader.read("abc123") }
val noExceptionsForMe: IO[Either[Throwable, Int]] = maybeExceptiony.attempt

In general, I think this solution is the right approach for code which depends heavily upon stuff happening outside of it (exception states will abound regardless of the stuff we write) and it strikes a nice compromise position that isn't too heavy-handed about how your call site is going to look.

The main issue I see with the use of Validation is that we don't have a lot of independently checkable steps so much as a chain of (possibly failing) behaviors that are strictly dependent upon prior results - we can't really take full advantage of the Applicative avoidance of the Monad's short-circuiting behavior

pomadchin commented 6 years ago

My vote would be for IO, looks like a great compromise, also would give an async API for free (if we'd like to make it async).

pomadchin commented 5 years ago

Reference to a good PR by @moradology that demonstrates how we can abstract over IO: https://github.com/locationtech/geotrellis/pull/2818

metasim commented 5 years ago

IMO, Either is too unopinionated for this use case. I think Validation is better because it's in-your-face about reporting errors. Something like Rust's Return is also a consideration, which is like Either except that there's no Left/Right ambiguity. For those writing the API it may not seem like a big deal to just get used to Either being left or right biased, but to library users it can cause a lot of cognitive friction and eye squinting.

metasim commented 5 years ago

PS: I'm assuming that the general GeoTrellis target user is less adept at type gymnastics than an API developer (i.e. users are in levels A1-3). If that's not the goal with GeoTrellis, I guess you should just:

2kh49q

pomadchin commented 5 years ago

@metasim we hope that the abstraction would allow users to choose what they want to use to handle effects, that can be IO, this can be Future / twitter Future / Either / Validated / whatever. Also, talking in particular exceptions handling in IO: https://github.com/typelevel/cats-effect/issues/67#issuecomment-410913527

Also don't forget that IO gives us abilities to parallelize for free.

But i agree that nothing extra complicated should be exposed into the API. It looks like the choose of exception handling mechanism should be as much as it's possible on the user side.

Mb you just need to try some more cats ;)

P.S. Validated is not a monad; so you can't flatMap it; wondering how to combine methods with such a return type.

metasim commented 5 years ago

Mb you just need to try some more cats ;)

For me I'm fine with cats (I do need to use it more). I just know that not all users (or at least everyone on my team) is comfortable with it, and I would prefer to be able to promote the use of GeoTrellis without a bunch of "but first you must learn this" caveats. So I have two threads of thoughts here: 1) how we manage errors internally in the API implementation (where I have less opinion as long as it's consistent), and 2) externally, at the user-facing surface area of the API, where I think we should strive for minimizing friction without losing flexibility. Cats may be the right answer in both cases, but IMO we just need to be thoughtful, deliberate and consistent about it. Even the term "monad" is still scary to a lot of devs, and seeing it as a part of an API can hamper adoption (I'm thinking of the large number of Java refugees coming to Spark/Scala).

echeipesh commented 5 years ago

@metasim I think you hit the nail on the head there, the result is going to be different tiers of API intended for different goals. The distinction somewhat exists right now but looks like we'll have to much more formal and deliberate about it in the future.

https://github.com/locationtech/geotrellis/pull/2818 is a pretty example of driving cause: There is a realization and a need that performing operations makes sense in at least three contexts:

Rewriting code for each case is tremendously expensive and hard to maintain so we find ourselves reaching for higher order abstractions. At the same time leaking complexity (even through types) is bad practices and when rubber hits the road you're trying to do concrete thing A to thing B.

I think as we push we're going to end up with three levels of API:

1) Java-like API around performance and IO edges

2) Async API around modularity and co-ordination

3) Notebook/integration API

IO[Either[Throwable, Int]] might make total sense if you're integrating with http4s, doobie or fs2 but would be really out of place in Spark API.

I hope we can start using cats in the core where it has low impact on users at first and check ourselves when it starts leaking out.

metasim commented 5 years ago

@echeipesh Great followup.

Java-like API around performance and IO edges

^---- Would also make Python bindings a heck of a lot easier.

metasim commented 5 years ago

PS: I hope I've not come across as anti-cats, fs2, IO or whatever.... I think all that performance heavy stuff should use whatever abstraction lends to the best 1) maintainability, and 2) performance. It's more, as Eugene said, me being fussy about also thinking about the same problem from the outside in, and what kind of type leakage to the end user is acceptable.