Currently, Scanner.Scan reads data synchronously and we need to create a Scanner for each file.
This PR adds a new type FileList and FileListScanner that logically concatenates files into a single record stream and allows scanning a segment of this logical stream. More importantly, FileListScanner.Scan reads data asynchronously.
This PR also adds a benchmark that compares the synchronous and asynchronous reading, assuming each record needs a small amount of time to "consume" it. The comparison result is as follows:
yi@WangYis-iMac:/go/src/github.com/wangkuiyi/recordio (async_read)*$ go test -bench=SyncAndAsync
goos: darwin
goarch: amd64
pkg: github.com/wangkuiyi/recordio
BenchmarkSyncAndAsyncRead/Synch_reading-4 2 795081294 ns/op
BenchmarkSyncAndAsyncRead/Async_reading-4 3 490042952 ns/op
PASS
ok github.com/wangkuiyi/recordio 9.004s
It seems that the async reading is about twice the speed comparing to sync reading.
Currently,
Scanner.Scan
reads data synchronously and we need to create a Scanner for each file.This PR adds a new type
FileList
andFileListScanner
that logically concatenates files into a single record stream and allows scanning a segment of this logical stream. More importantly,FileListScanner.Scan
reads data asynchronously.This PR also adds a benchmark that compares the synchronous and asynchronous reading, assuming each record needs a small amount of time to "consume" it. The comparison result is as follows:
It seems that the async reading is about twice the speed comparing to sync reading.