Sanitize JSON - Githubissues

nsilva-ta commented 2 years ago

Brief Description

Plugin remove_invalid can be used to delete server files when a JSON representation is upload with a malicious id

Expected behavior

Exploration of file storage should not get out of file storage folder root

Actual behavior

Using '../../../../../file' in id will point to other files in the server

Simplest self-contained example code to demonstrate issue

# config/initializers/shrine.rb
require 'shrine'
require 'shrine/storage/file_system'

# Storage
Shrine.storages = {
  cache: Shrine::Storage::FileSystem.new(Rails.application.credentials.storage.presence, prefix: 'cache'),
  store: Shrine::Storage::FileSystem.new(Rails.application.credentials.storage.presence)
}

# Active Record integration to the attachment interface
Shrine.plugin :activerecord

# File validation
Shrine.plugin :determine_mime_type, log_subscriber: nil
Shrine.plugin :restore_cached_data
Shrine.plugin :remove_invalid

# Retain the cached file across form redisplays
Shrine.plugin :cached_attachment_data

# app/uploaders/photo_uploader

require 'image_processing/mini_magick'

class PhotoUploader < ApplicationUploader
  MAX_SIZE = 2.megabytes
  VALID_EXTENSIONS = %w[jpg jpeg png webp].freeze
  VALID_MIME_TYPES = %w[image/jpeg image/png image/webp].freeze

  plugin :store_dimensions, log_subscriber: nil

  Attacher.validate do
    validate_min_size 10.kilobytes
    validate_max_size MAX_SIZE
    validate_extension VALID_EXTENSIONS
    validate_max_dimensions [1500, 1500] if validate_mime_type VALID_MIME_TYPES
  end
end

Having in your rails server a file named /tmp/test.txt, If a user replaces the input file field by a text file field or uses the file input hidden field reserved for cached data, used for a PhotoUploader field, and change its value to the following malicious string '{"id":"../../../../../../tmp/test.txt", "storage": "cache"}', the file is deleted.

System configuration

Ruby version: 2.6.7

Shrine version: 3.4.0

janko commented 2 years ago

Thanks for reporting, I agree this is a security issue. My first thought is that we should disable IDs for filesystem storage from referencing files outside of the storage directory.

nsilva-ta commented 2 years ago

And the complete id should be filtered allowing only word characters and a few others. But I down know the real impact of that filtering in other kinds of file storage.

bb commented 7 months ago

It's a pity to see this security issue open and unhandled for so long. I want to suggest a different approach which I implemented but in a project but didn't extract to a plugin yet. It's working in my scenario, but I'm afraid there's lots of work needed to make it work as a generic Shrine plugin.

As an alternative to filtering, I'm signing my cached data and verify it on the server (but not on the client). This way, a client can no longer tamper with the data in the hidden fields nor can they access random files on the filesystem any more.

I'm using Rails MessageVerifier, so I don't know how this would work with Roda etc. and it needs message serializer :json or :json_allow_marshal to easily decode on the client.

These are my changes, it would be great if someone with better overview about shrine internals and plugins could transfer them to e.g. plugins like :signed_cached_attachment_data and :restore_signed_cached_data or similar.

routes.rb

Note that the url generation replaces the usual configuration via the upload endpoint

  # old: mount ImageUploader.upload_endpoint(:cache) => "/images/upload"
  mount ImageUploader.upload_endpoint(:cache, rack_response: ->(uploaded_file, request) { [200, { "Content-Type" => Shrine::UploadEndpoint::CONTENT_TYPE_JSON }, [{signed_data: Rails.application.message_verifier(:shrine).generate(uploaded_file), url: uploaded_file.derivation_url(:thumbnail, 224, 126)}.to_json]] }) => "/images/upload"

uppy.ts

Note that you can still use the response.uploadURL here if you want.

  uppy.on("upload-success", (_file, response) => {
    const hiddenField = document.createElement("input");
    const data = response.body.signed_data
      ? JSON.parse(atob(response.body.signed_data.split("--")[0]))
      : response.body.data;
    hiddenField.type = "hidden";
    hiddenField.name = `post[images_attributes][${data.id.replace(/\D/g, "")}][image]`;
    hiddenField.value = response.body.signed_data;
...

controller

    images_attributes = post_params[:images_attributes].to_h
    images_attributes.each do |k,v|
      if v[:image].present?
        v[:image] = Rails.application.message_verifier(:shrine).verified(v[:image])
      end
    end

form.html.haml

       -# old: = p.hidden_field :image, value: p.object.cached_image_data
       = p.hidden_field :image, value: Rails.application.message_verifier(:shrine).generate(p.object.cached_image_data)

adam12 commented 7 months ago

Your solution is interesting. Perhaps using OpenSSL::HMAC we could sign with a key that only Shrine knows, simulating the message verifier logic. This would make it cross-framework compatible.

bb commented 7 months ago

Not sure if we're running into a misunderstanding here. Let's try to make sure we're on the same page. Rails MessageVerifier also uses OpenSSL::HMAC itself. The "magic" ActiveSupport adds on top is

different serialization strategies (json, marshal, ...) -- we should use json for ease of use on the client
key generation based on the app secret + salt -- we could use whatever the base framework provides or have the developer configure one
configurable digest algorithms -- I guess we should just stick with configured framework defaults (or default/fallback to SHA256)
rotation -- not sure if we really need it as cached data is (relatively) shortlived
caching of implementation -- could be stored as instance variable in the uploader itself

So the key seems to be quite easy to get, we just use Rails defaults (or other known frameworks defaults) if available or require it to be configured by the developer otherwise. I guess with "key that only Shrine knows" means a key which is only available on the server but not on the client, right?

Questions remaining from my perspective:

does this in any way interact with the presign endpoint or do we never need to think about it here?
can we get rid of re-extracting metadata because by signing it's already made tamper-proof?
(how) can we put the verification into the accessor, so it doesn't need to be done by the developer in the controller?
how can we best hook into the uploader endpoint to keep existing functionality (like providing a derivation URL via config)?
should it be a separate plugin, e.g. signed_cached_attachment_data or be an option (or the default) of the existing plugin?
which parts of the documentation are influenced by that and need updates?
should we take a few steps back and not sign the full uploaded_file json but only add a signature field next to the ID which only signs the ID (and verify it on the server) and thus make only this tamper-proof and keep re-extracting metadata etc. -- keeping big parts identical to the previous state?

adam12 commented 7 months ago

I think we're on the same page. I was only musing that OpenSSL::HMAC could be a reasonably framework-agnostic approach, but I haven't given any thought to the internals (nor do I have much familiarity with them).

So the key seems to be quite easy to get, we just use Rails defaults (or other known frameworks defaults) if available or require it to be configured by the developer otherwise. I guess with "key that only Shrine knows" means a key which is only available on the server but not on the client, right?

Yes. Either pass in the key from Rails if detected, or just prompt the user to set one, similar to how it's done for Rack::Session in many non-Rails apps.

Shrine.verified_messages secret: "some-secret"

or something.

I don't think we need to encrypt the entire message, since there's nothing sensitive in it (that I'm aware of?). Perhaps something simple can be done, such as returning the signature as a key in the JSON. The HMAC can be generated with a salt, secret, and then the alphabetical key/value pairs serialized from the JSON with the signature excluded. Again, just musing.

bb commented 7 months ago

to encrypt the entire message

nothing is encrypted (i.e. not readable by the client), we're only talking about signing (i.e. not modifiable) here.

If we sign everything, we don't need to re-extract metadata. If we sign only e.g. the ID, then we of course still continue doing the re-extraction.

Another approach might be to have both data and signed_data objects. While the data would allow easy reading on the client, the signed_data would allow restoring untampered data on the server. Of course the client would still need a change to send back the signed data.

janko commented 7 months ago

How would signing work with direct uploads to a service like S3? There the upload doesn't touch the backend, except for the presign request.

bb commented 7 months ago

I never used that, so I don't know. In my changes outlined above, I only touched the upload endpoint. That's why I also asked this:

does this in any way interact with the presign endpoint or do we never need to think about it here?

Looking at the documentation of the presign endpoint, i think if the client would tamper any data there, the upload to S3 just wouldn't work. I guess the answer how this interacts with signing might come with the question how and where does metadata extraction happen in the direct upload / S3 case.

adam12 commented 7 months ago

Pre-signing is going to be a challenge. Is it common to pair the remove_invalid plugin with a remote store?

It would probably be a large shift, but perhaps we'd need to provide a secondary field for remotely provided data/identifiers. If it's present, we use the identifier from the secondary field to acquire our own data from the storage backend and build a JSON object with relevant metadata and then some sort of signature as discussed above.

I don't know if this solves the original issue of path validation, so perhaps it's two separate issues.

Paths must be contained within the bounds of their storage prefix.
JSON-formatted form params should be signed to prevent tampering.

shrinerb / shrine

Sanitize JSON #573

Brief Description

Expected behavior

Actual behavior

Simplest self-contained example code to demonstrate issue

System configuration

routes.rb

uppy.ts

controller

form.html.haml