wangkuiyi / recordio

Apache License 2.0
11 stars 2 forks source link

Add FileList and FileListScanner for async reading #61

Closed wangkuiyi closed 5 years ago

wangkuiyi commented 5 years ago

Currently, Scanner.Scan reads data synchronously and we need to create a Scanner for each file.

This PR adds a new type FileList and FileListScanner that logically concatenates files into a single record stream and allows scanning a segment of this logical stream. More importantly, FileListScanner.Scan reads data asynchronously.

This PR also adds a benchmark that compares the synchronous and asynchronous reading, assuming each record needs a small amount of time to "consume" it. The comparison result is as follows:

yi@WangYis-iMac:/go/src/github.com/wangkuiyi/recordio (async_read)*$ go test -bench=SyncAndAsync
goos: darwin
goarch: amd64
pkg: github.com/wangkuiyi/recordio
BenchmarkSyncAndAsyncRead/Synch_reading-4                  2     795081294 ns/op
BenchmarkSyncAndAsyncRead/Async_reading-4                  3     490042952 ns/op
PASS
ok      github.com/wangkuiyi/recordio   9.004s

It seems that the async reading is about twice the speed comparing to sync reading.