terascope / file-assets

Teraslice processors for working with data stored in files on disk, S3 or HDFS.
MIT License
1 stars 2 forks source link

File asset usability improvements #541

Closed peterdemartini closed 3 years ago

peterdemartini commented 3 years ago

IMPORTANT:

USABILITY:

peterdemartini commented 3 years ago

Also we should add more jsdoc especially for configuration values that have a specific reason to exist that may not be obvious, like line_delimeter should have "Useful for overriding the line delimiter for making windows compatible files with carriage returns".

peterdemartini commented 3 years ago

The S3Reader should be named S3Fetcher since it doesn't do both fetching and slicing.

Also the file names in ./packages/file-asset-apis/src/s3-api should not have the -api suffix since it is inconsistently used and unnecessary.

jsnoble commented 3 years ago

All readers export a function to create slicers, currently the hdfs does not but it will shortly

peterdemartini commented 3 years ago

But that seems like a helper function to create something else that does the slicing. Seems kind of inconsistent.

peterdemartini commented 3 years ago

Also regarding the multi-part upload we need make sure the implementation doesn't single create a gigantic string or buffer and then split it up since that would still make us hit the max string or buffer limit of 1GB. So it should be split up before serializing.