Closed bbotella closed 8 years ago
If I understand correctly, current implementation only handles gzipped jsonlines files. What about:
FSReader
name in favor or something like JsonLinesReader
JsonReader
, CSVReader
, XMLReader
, etcI'm not sure about compression. Should it be detected automatically? Or indicated in options? Like "options": {"compression": "gzip"}
@eliasdorneles Thoughts?
Hm, the thing is, other file-based readers (e.g. S3Reader) are also assuming the input is JSON lines.
I believe a better approach would be to extract out of FSReader and S3Reader the bits that understand the files to be JSON lines into a new abstraction (e.g. JsonLinesImporter, JsonImporter, CSVImporter, ...) and use those in all file-based readers (currently only FSReader and S3Reader, in the future we'd have SftpReader, DropboxReader, etc).
Essentially, the idea would be to do for the readers the same as we did for the writers: the file-based writers support writing to different formats (XML, CSV, JSON) through a formatter.
Yup. Totally agreed with @eliasdorneles . Thing is to make "file format" and "file compression" independent of where it is read, just like we do with writers at this point. I like the idea of having a FileBasedReader that handles this. In the future, we could even try with "automatic format detectors".
Added in #316
I think that adding support for different formats in filebased readers is a must. That is both file format (csv, xml...) and compression formats (zip, gz, tar...)