ValueReaders Should Return Option[K]

jbouffard commented 6 years ago

Right now, whenever a ValueReader fails to read a given Tile it throws an error. However, this is different from what LayerReaders do which is just to return an empty layer if the given query returned nothing. We should change the return type of ValueReader.read to Option[K] so that it doesn't throw when it can't find the Tile.

metasim commented 6 years ago

FWIW, here are my thoughts about error handling for multi-stage parsing or processing to derive a desired value.

TL;DR: Consider Validation.

My current viewpoint:

1) Checked exceptions suck 2) Undeclared unchecked exceptions are just malicious 3) Option[T] is lossy 4) Either[T, Throwable] can be constraining (no error accumulation), and assumes non-parallel validation 4a) Either[T, Seq[Throwable]] or Either[T, Nel[Throwable]] provides accumulation, but you have to work out the compositional semantics yourself. 5) Either[Throwable, T] points out the problem with Either... I can't remember the convention of which side is the "good" side. 6) Try[T] is a bit better (can be transformed into an Option or Either), but frankly I find the default combinators hard to work with (e.g. recover vs recoverWith vs fold...). I still use it quite a lot though. 7) Validation or ValidationNel from scalaz/cats are probably solving the problem "correctly", but every time I start to use them I feel like they are over engineering the problem in the name of some abstract "purity". And they need to be used consistently to have real value.

What I do, vs what I should do:

When I'm not writing API or application code (throwaway stuff), I do 2.
When I'm really lazy I do 3. It almost never sticks, and then I consider the next items.
I consider 4/5, start thinking about what the semantics should be, get bored, and move on to 6.
6 doesn't solve the error accumulation problem, but I like it better than Either for some reason. I use this the most despite always having to look up how I should do error recovery
I've only fully committed to 7 a couple of times in some small applications that never saw the light of day. The first thing I thrash over is the commitment to scalaz or cats, and then which versions, and then the continual leakage of other types that result. It feels like a big commitment and potentially a divisive one. Then I get frustrated that the Scala standard library doesn't have it built in, saving me from having to make such a value decision, then I consider writing my own that has no dependencies, but know that's just stupid, and then I either go back to 6; or I go with cats, because, well, the cats folk are just nicer, and don't require me to declare cultural alliance by choosing "zee" over "zed".... Then, after all that drama I start to think I should go back through all of my API and replace any usage of Option/Either/Try with Validation and overwhelm myself with the work involved, often resulting in going back to 6 because, well it's the devil I know.

In summary, if I were writing something new, or willing to commit to reworking all the code to be consistent, some form of Validation is probably the right answer. But consistency is key if you want to get the benefits of composition, otherwise you're increasing cognitive surface area with no benefit. Otherwise, I'd go with Try.

pomadchin commented 6 years ago

Thanks @metasim for your quite expanded response. I'm thinking more towards MonadError or smth like that, in order not to be locked on default combinators.

jbouffard commented 6 years ago

Thank you, @metasim you raised some good points. @pomadchin Would we want to implement something like MonadBlunder from this article? I'm not familiar enough with cats to know if they have something like this now, though. What are your thoughts @moradology @jpolchlo @jamesmcclain ?

moradology commented 6 years ago

I think there's another option so far not considered (save obliquely in the context of MonadError): cats.effect.IO. IO would allow us to completely defer the question of returning Either[Throwable, Result] vs throwing. For example:

val maybeExceptiony: IO[Int] = IO { myIntReader.read("abc123") }
val noExceptionsForMe: IO[Either[Throwable, Int]] = maybeExceptiony.attempt

In general, I think this solution is the right approach for code which depends heavily upon stuff happening outside of it (exception states will abound regardless of the stuff we write) and it strikes a nice compromise position that isn't too heavy-handed about how your call site is going to look.

The main issue I see with the use of Validation is that we don't have a lot of independently checkable steps so much as a chain of (possibly failing) behaviors that are strictly dependent upon prior results - we can't really take full advantage of the Applicative avoidance of the Monad's short-circuiting behavior

pomadchin commented 6 years ago

My vote would be for IO, looks like a great compromise, also would give an async API for free (if we'd like to make it async).

pomadchin commented 5 years ago

Reference to a good PR by @moradology that demonstrates how we can abstract over IO: https://github.com/locationtech/geotrellis/pull/2818

metasim commented 5 years ago

IMO, Either is too unopinionated for this use case. I think Validation is better because it's in-your-face about reporting errors. Something like Rust's Return is also a consideration, which is like Either except that there's no Left/Right ambiguity. For those writing the API it may not seem like a big deal to just get used to Either being left or right biased, but to library users it can cause a lot of cognitive friction and eye squinting.

metasim commented 5 years ago

PS: I'm assuming that the general GeoTrellis target user is less adept at type gymnastics than an API developer (i.e. users are in levels A1-3). If that's not the goal with GeoTrellis, I guess you should just:

2kh49q

pomadchin commented 5 years ago

@metasim we hope that the abstraction would allow users to choose what they want to use to handle effects, that can be IO, this can be Future / twitter Future / Either / Validated / whatever. Also, talking in particular exceptions handling in IO: https://github.com/typelevel/cats-effect/issues/67#issuecomment-410913527

Also don't forget that IO gives us abilities to parallelize for free.

But i agree that nothing extra complicated should be exposed into the API. It looks like the choose of exception handling mechanism should be as much as it's possible on the user side.

Mb you just need to try some more cats ;)

P.S. Validated is not a monad; so you can't flatMap it; wondering how to combine methods with such a return type.

metasim commented 5 years ago

Mb you just need to try some more cats ;)

For me I'm fine with cats (I do need to use it more). I just know that not all users (or at least everyone on my team) is comfortable with it, and I would prefer to be able to promote the use of GeoTrellis without a bunch of "but first you must learn this" caveats. So I have two threads of thoughts here: 1) how we manage errors internally in the API implementation (where I have less opinion as long as it's consistent), and 2) externally, at the user-facing surface area of the API, where I think we should strive for minimizing friction without losing flexibility. Cats may be the right answer in both cases, but IMO we just need to be thoughtful, deliberate and consistent about it. Even the term "monad" is still scary to a lot of devs, and seeing it as a part of an API can hamper adoption (I'm thinking of the large number of Java refugees coming to Spark/Scala).

echeipesh commented 5 years ago

@metasim I think you hit the nail on the head there, the result is going to be different tiers of API intended for different goals. The distinction somewhat exists right now but looks like we'll have to much more formal and deliberate about it in the future.

https://github.com/locationtech/geotrellis/pull/2818 is a pretty example of driving cause: There is a realization and a need that performing operations makes sense in at least three contexts:

local
local async
streaming async
batch (which may have async parts)

Rewriting code for each case is tremendously expensive and hard to maintain so we find ourselves reaching for higher order abstractions. At the same time leaking complexity (even through types) is bad practices and when rubber hits the road you're trying to do concrete thing A to thing B.

I think as we push we're going to end up with three levels of API:

1) Java-like API around performance and IO edges

IO code by default is performance centric and least type-full anyway.
Basic for most extensions through SPI or just use provided types

2) Async API around modularity and co-ordination

Capture general patterns of operation that can be re-used in multiple contexts

3) Notebook/integration API

Something that cobbles previous level for specific system (Spark vs FS2)
Probably something pretty specific about the types, or at least much constrained types.
Simple enough that you would be able to write it in a notebook.

IO[Either[Throwable, Int]] might make total sense if you're integrating with http4s, doobie or fs2 but would be really out of place in Spark API.

I hope we can start using cats in the core where it has low impact on users at first and check ourselves when it starts leaking out.

metasim commented 5 years ago

@echeipesh Great followup.

Java-like API around performance and IO edges

^---- Would also make Python bindings a heck of a lot easier.

metasim commented 5 years ago

PS: I hope I've not come across as anti-cats, fs2, IO or whatever.... I think all that performance heavy stuff should use whatever abstraction lends to the best 1) maintainability, and 2) performance. It's more, as Eugene said, me being fussy about also thinking about the same problem from the outside in, and what kind of type leakage to the end user is acceptable.

locationtech / geotrellis

ValueReaders Should Return Option[K] #2732