twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.49k stars 703 forks source link

Improve the memory backend usability and testing #1816

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

This makes the memory source and sinks more realistic: we can read and write using a Future, so this seems like it is actually something you could potentially use in production.

Some poor-man's benchmarking shows this is a lot faster than cascading local mode:

some example timings:

scalding: 1.24314 ms
cascading: 210.13103 ms

scalding: 0.497229 ms
cascading: 205.542811 ms

scalding: 1.300902 ms
cascading: 207.588402 ms

scalding: 9.135427 ms
cascading: 248.029985 ms

scalding: 0.4009 ms
cascading: 0.750325 ms

scalding: 0.276766 ms
cascading: 0.656262 ms

scalding: 6.968218 ms
cascading: 218.509877 ms

scalding: 3.855046 ms
cascading: 214.557583 ms

Of course we can optimize more and look better use parallelism.

ianoc commented 6 years ago

showing up red in ci

but all the changes lgtm, merge when green

johnynek commented 6 years ago

yeah, this hit #1814 and #1804 restarted.