Closed dvdbng closed 8 years ago
I like where this is going, looks like a good way of addressing #298 :+1:
So, I've been testing this and it's working pretty fine, I'm quite happy with it -- awesome work, @Youwotma ! :)
I have only a couple comments, but they're really small things:
1) It's assuming ZLibDecompressor by default, which is good because of backwards compatibility, but nothing is being shown in the log to indicate that. It would be nice to have some message in the logs to help users trying to use it with uncompressed files -- currently one just gets a stacktrace with:
ERROR:export-pipeline: -- EXPORTMANAGER -- Error -3 while decompressing: incorrect header check
Logging No decompressor configured, using default ZLibDecompressor
is a good start.
Catching that exception and appending a message like (use NoDecompressor if you're using uncompressed input)
is even better
2) The same applies for deserializers, which the assumed default is JSON lines (albeit probably a lesser problem, since).
This introduces the concept of a "Stream" based reader, where the underlying storage backend operates in files/bytes (like S3) as opposed to structured objects (like hubstorage).
Having binary/stream based readers allows us to have pluggable decompressors and deserializers, instead of hardcoding them in the readers.
This reuses the same
get_read_streams
method that the stream bypass usesAlso adds a CSV desserializer.
Improvements planned: