Some of the model-output files in the old Flusight hub that we're converting to Hubverse format contain spaces in their corresponding directory names and/or spaces in the filename.
Preliminary testing with these files and directories unearthed two issues:
The model-output transform class does not encode incoming URI strings, so file/folder names with a space throw pyarrow-related URI errors
I didn't see an S3 event trigger being emitted from AWS when manually uploading those files
Work required
For the first item, let's encode the incoming S3 key information, because we should do that anyway. The second item may require some additional digging to understand what is happening, but we're almost certainly not the only people who want to see S3 putObject triggers for filenames that contain spaces.
Definition of done
[ ] The model-output transform encodes spaces for both its input and output URIs that are used by PyArrow to read and write data to S3.
[ ] We understand why some of the test files aren't triggering the transform lambda and create a follow-up issue if needed.
Background
Some of the model-output files in the old Flusight hub that we're converting to Hubverse format contain spaces in their corresponding directory names and/or spaces in the filename.
Preliminary testing with these files and directories unearthed two issues:
Work required
For the first item, let's encode the incoming S3 key information, because we should do that anyway. The second item may require some additional digging to understand what is happening, but we're almost certainly not the only people who want to see S3
putObject
triggers for filenames that contain spaces.Definition of done