Closed johnynek closed 5 years ago
Bullish, i like the concept of these since it avoids another step that an either can require and for a sufficiently rare output its kind of nice. But in practice this is spooky at a distance in scalding since it depends on how the actual graph is made into physical nodes.
(I think how i'd implement this if I had the time would be something like :
myTypedPipe.map { e =>
MyTrapFunction("s3://my/location) {
... code that returns a Try
}
}
Write to the file + flush (requires a global background static of course around the path for writing errors from each task...), vm shuts down the output it should be ~fine then i think. )
@ttim @dieu can we merge this before publish 0.18?
The semantics of addTrap are not great.
If a tuple causes an error before the next write barrier write it to an output but where the next write barrier is depends on the graph beneath it. so, you can't return a TypedPipe that has a trap and be sure about very much. I think it is better to do:
pipe.map { input => /* something that may fail * / }: TypedPipe[Either[ErrorData, T]]]
then .forceToDisk and write out the failure branch and filter the other in the rare case you have non-deterministic data jobs.see gitter discussion: https://gitter.im/twitter/scalding?at=5a8c6996888332ee3aac833b