shrinerb / shrine

File Attachment toolkit for Ruby applications
https://shrinerb.com
MIT License
3.18k stars 275 forks source link

Allow encoding to be specified with S3 storage class #585

Closed pond closed 2 years ago

pond commented 2 years ago

When using local file storage with Shrine, a call to #read will accept encoding: ... in the same way as one might expect for a regular IO object. This is quite important when dealing with input data for things like CSV which might come from all manner of sources, including correctly-encoded UTF-8 with or without BOM, ISO8859-1 or (quite often) Windows CP1252.

In our application, we used this to figure out the encoding and decode the CSV accordingly, but while it works for local development where file storage is convenient for most engineers, it didn't work in deployment since the S3 storage class didn't understand that option and just passed it through to the AWS SDK, which understandably complained.

This PR fixes that in a backwards-compatible way by accepting a with-nil-default encoding option and passing it to the Down::ChunkedIO constructor. The updated tests pass with the patch present, else fail with:

  1) Failure:
Shrine::Storage::S3::#open#test_0007_accepts :encoding option [...shrine/test/storage/s3_test.rb:334]:
Expected: #<Encoding:UTF-8>
  Actual: #<Encoding:ASCII-8BIT>

That's not the end of the story, unfortunately! When Shrine is actually (out of tests) driving the Down::ChunkedIO object, it calls #read on the instance and passes a buffer length. For some reason, if and only if a buffer length is given, Down currently ignores the encoding that it's been given. This is addressed by https://github.com/janko/down/pull/74, which will hopefully be merged soon.

In the mean time, this PR for Shrine is harmless and beneficial if either using a monkey-patched Down gem (as we currently are, as well as monkey patched Shrine!) or using a future Down gem release with the above PR merged.

pond commented 2 years ago

(Bump) Any interest in this? It's quite important to us (various encoding issues discovered down the chain - see referenced Down gem PR and subsequently linked issue https://github.com/janko/down/issues/73). Thanks!

benedikt commented 2 years ago

We're running into the exact same issue and passing the encoding to Down::ChunkedIO seems to fix things for us as well. Would appreciate if this was merged :)

janko commented 2 years ago

Sorry for the wait, I just haven't been finding the energy for Shrine. This change looks good and useful, I'm merging 👍🏻

benedikt commented 2 years ago

@janko Thanks a lot for merging this and for all the time and energy you've put into Shrine. It's very much appreciated!

pond commented 2 years ago

Thanks for merging this 😄

Is it possible to do a Shrine release to wrap up recent fixes? At present, the most up-to-date public release on RubyGems is 3.4.0 from June 14, 2021.