hubverse-org / hubverse-transform

Data transform functions for hubverse model-output files
MIT License
1 stars 0 forks source link

Cloud-based model-output transforms should handle spaces in directory names and filenames #9

Closed bsweger closed 4 months ago

bsweger commented 4 months ago

Background

Some of the model-output files in the old Flusight hub that we're converting to Hubverse format contain spaces in their corresponding directory names and/or spaces in the filename.

Preliminary testing with these files and directories unearthed two issues:

  1. The model-output transform class does not encode incoming URI strings, so file/folder names with a space throw pyarrow-related URI errors
  2. I didn't see an S3 event trigger being emitted from AWS when manually uploading those files

Work required

For the first item, let's encode the incoming S3 key information, because we should do that anyway. The second item may require some additional digging to understand what is happening, but we're almost certainly not the only people who want to see S3 putObject triggers for filenames that contain spaces.

Definition of done