Closed dolsysmith closed 3 years ago
I tested loading files from SFM by date, editing the code to load by size, and loading files by date with the filenames not matching the pattern (got the appropriate error logged and raised).
loader.docker-compose.yml
needs the volumes lines commented back out so that it works on prod where the code is not available to be linked in.
With that, looks good to merge.
Concatenates gzipped JSON extracts by date in the filename. If this produces too many files, this enhancement also supports concatenating the files up to a specified maximum file size (e.g., 2G), using the environment variable provided for the Spark extracts. (At this point, the choice is hard-coded, but in the future, it could also be specified as a command-line parameter.)
Testing
Load dataset and ensure that JSON files are visible, downloadable, and unzip properly.