rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust
Apache License 2.0
2.23k stars 205 forks source link

Local fs reader #23

Closed iduartgomez closed 4 years ago

iduartgomez commented 4 years ago

Preliminary PR for code review. The goal of this PR is to add basic file reader to RDD API, for now just the local FS and for the local host, with a couple changes should work in a distributed manner across different worker nodes.

The basic idea is: 1) Create an RDD of 'loaders' than will read from disk in parallel from a balanced rdd 2) Pass a function to map a dyn Reader to some output (basically the reading & parsing function) 3) This function, under the hood, uses the readers set by (1)

The traits are reusable for other adapters, some minor changes are necessary. It also needs a proper test, must finish this before the PR is merged.

iduartgomez commented 4 years ago

@rajasekarv I have added a few tests and polished/streamlined it, I think is good now (for now). The current approach is something I can expand upon and improve in the future.

Feel free to merge if you don't see any problems.

rajasekarv commented 4 years ago

This seems fine. Now we need to define a method to read textfile which will utilize this local reader just for textfiles. We can add other file systems later. Nice work @iduartgomez