Closed peterdemartini closed 3 years ago
Also we should add more jsdoc especially for configuration values that have a specific reason to exist that may not be obvious, like line_delimeter
should have "Useful for overriding the line delimiter for making windows compatible files with carriage returns".
The S3Reader
should be named S3Fetcher
since it doesn't do both fetching and slicing.
Also the file names in ./packages/file-asset-apis/src/s3-api
should not have the -api
suffix since it is inconsistently used and unnecessary.
All readers export a function to create slicers, currently the hdfs does not but it will shortly
But that seems like a helper function to create something else that does the slicing. Seems kind of inconsistent.
Also regarding the multi-part upload we need make sure the implementation doesn't single create a gigantic string or buffer and then split it up since that would still make us hit the max string or buffer limit of 1GB. So it should be split up before serializing.
IMPORTANT:
file_per_slice: false
is not respected withldjson
, generates file withfoobar/0.0.ldson
foobar/0.0ldjson
USABILITY:
ldson
should beldjson
.ldjson.gz
orldjson.lz4