twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.49k stars 703 forks source link

Make addTrap deprecated #1818

Closed johnynek closed 5 years ago

johnynek commented 6 years ago

The semantics of addTrap are not great.

If a tuple causes an error before the next write barrier write it to an output but where the next write barrier is depends on the graph beneath it. so, you can't return a TypedPipe that has a trap and be sure about very much. I think it is better to do: pipe.map { input => /* something that may fail * / }: TypedPipe[Either[ErrorData, T]]] then .forceToDisk and write out the failure branch and filter the other in the rare case you have non-deterministic data jobs.

see gitter discussion: https://gitter.im/twitter/scalding?at=5a8c6996888332ee3aac833b

ianoc commented 6 years ago

Bullish, i like the concept of these since it avoids another step that an either can require and for a sufficiently rare output its kind of nice. But in practice this is spooky at a distance in scalding since it depends on how the actual graph is made into physical nodes.

(I think how i'd implement this if I had the time would be something like :

myTypedPipe.map { e =>
  MyTrapFunction("s3://my/location) {
  ... code that returns a Try
  }
}

Write to the file + flush (requires a global background static of course around the path for writing errors from each task...), vm shuts down the output it should be ~fine then i think. )

johnynek commented 5 years ago

@ttim @dieu can we merge this before publish 0.18?