shrinerb / shrine

File Attachment toolkit for Ruby applications
https://shrinerb.com
MIT License
3.18k stars 275 forks source link

Basic Shrine usage creates invalid HTML - duplicate names and IDs are present #567

Closed pond closed 2 years ago

pond commented 2 years ago

Brief Description

HTML prohibits more than one element having the same ID within an entire document. Shrine's most basic use documented in https://shrinerb.com/docs/getting-started immediately violates this:

form_for @photo do |f|
  f.hidden_field :image, value: @photo.cached_image_data
  f.file_field :image
  f.submit
end

...leading to a hidden element and a file element both with name photo[image] and ID photo_image. This is invalid and was something we noticed when switching to Cuprite over Selenium/Webdrivers, which has a different idea for certain finders of what 'findable' means by default and caused test failures via Capybara::Ambiguous exceptions.

Worse yet, Shrine's documentation then goes on to show that you're relying on broken HTML and form handling behaviour working in a very exact way:

Note that the file field needs to go after the hidden field, so that selecting a new file can always override the cached file in the hidden field.

Whether or not you might subjectively agree that this is good or bad advice, the presence of duplicate IDs in one document is objectively illegal according to the HTML specification. The HTML that Shrine's recommendations generate is invalid and the behaviour it asserts is only true if it gets the exact undocumented HTML form handling behaviour in browsers that it expects. [LATER EDIT: My assertion of undoc'd behaviour is incorrect as far as the form submission goes; see discussion later; resolution of this turns out to be a simple docs-only change]

Expected behavior

Shrine should be usable without creating invalid HTML documents.

Actual behavior

(As described)

Simplest self-contained example code to demonstrate issue

(As described - sample in Getting Started guide is sufficient)

System configuration

Ruby version: 3.1

Shrine version: I'm on 3.4, but any 3.x series would I presume require (or at least, recommend) the same bad HTML markup

benkoshy commented 2 years ago

Hi there,

(I'm not the maintainer, but an interested party):

surely not e.g. a Base64 or similar inline representation of the entire binary object written into the HTML output stream...?) seems rather high risk

Would you please be able to elaborate on this point here?

jrochkind commented 2 years ago

I think avoiding the HTML duplicate ID issue is just a matter of providing the right arguments to the rails methods to give one of them a different ID, and the example should otherwise keep working just fine.

I don't think there is any problem with having duplicate name attributes, I think this is allowed by HTML just fine; is a somewhat standard/common thing to do; and as you note is intentional here, "so that selecting a new file can always override the cached file in the hidden field." But I think this is legal, and I've seen it used in other platforms. But changing that would require changing some other things in the example, re-architecting it, it's true.

That's just about the actual duplicate ID/name issue in the title of the issue -- it can be easily resolved without changing the overall architecture/design.

But then the rest of the issue goes into some concerns about the architecture/design, not just the duplicate ID issue.

I'm not really following those either. I think this actually is a fairly standard way to do things? I don't understand the "high risk" part either.

But you definitely can do things on the front-end however you want; you are free to provide Javascript etc like that. I don't think shrine should get into the business of providing/maintaining it's own front-end javascript though, needs there can just be so contextual, and there be monsters.

Note that shrine does provide supported examples of using third-party javascript libraries to support "direct to s3" uploads. I personally think this is the way you want to do it a lot of times, and does not (usually?) involve this hidden field with file data design.

This was just one example of one way of doing things -- although I agree that shrine should provide an example that basically works and is adequate/sufficient, like I agree we should fix the duplicate ID thing. If there other aspects of this example that aren't really adequate at all (taht I'm not totally following yet), we should definitely improve it.

But it's always just going to be one basic getting started example, which different apps will do differently. That's part of the challenge here too -- since there are so many ways an app can be set up under shrine (for instance, direct uploads vs not; also not necessarly even rails!), it's hard to provide an example that works with all of them.

That aspect of that example is meant to "retain the uploaded file in case of validation errors". If you didn't need to support that use case/in that way, the hidden field wouldn't need to be there at all. I am curious to see how ActiveStorage handles that use case, if at all -- you submit a file, there are validation errors, how is the file "kept around" to be submitted again? How does ActiveStorage handle this, anyone know?

This is making me think though -- shrine already has the file on cache storage, that's part of the architecture of shrine. Shouldn't there be a way to take advantage of this to indeed avoid having to embed the file as a hidden field, but just reference it's location in cache instead? That does seem to be perhaps the "better" way to do it... not sure the best way to do that though.

This stuff gets really confusing to talk about, sorry for so many words!

pond commented 2 years ago

@benkoshy

surely not e.g. a Base64 or similar inline representation of the entire binary object written into the HTML output stream...?) seems rather high risk

Would you please be able to elaborate on this point here?

User uploads (say) some big image or PDF or whatever. A few megabytes. That's a heavyweight thing. Next, due to a form validation error or later, because they subsequently edit the object, the implication might have been be that this binary data is included in the form in which case there'd be this truly huge Base64 string being sent back.

In the validation case, I dug deeper when it wasn't late on Friday anymore :joy: and saw that in fact (and much more reasonably) Shrine just puts in some metadata there which refers to a file in the Rails cache (for validation, with presumably different but equally informative metadata for editing an object with an already-persisted file, wherever that might be).

@jrochkind

I am curious to see how ActiveStorage handles that use case, if at all -- you submit a file, there are validation errors, how is the file "kept around" to be submitted again? How does ActiveStorage handle this, anyone know?

Given the metadata seen - that'll be how. For validation errors, the form upload leads to a temporary file in the Rails cache AFAICS and this is referenced since no database row was created and, one would hope (!) no ActiveStorage (e.g. S3) object would be created either. Again, for the case of editing a previously persisted record, I imagine that the metadata refers to that object in some way.

You're quite right, in any case, that the ID can be overridden and yes, since multiple names in a form are allowed and just stacked into an array, I'm incorrect that undocumented behaviour is relied upon since although the duplicate ids are being ignored by the browser, the name handling is part of normal processing.

In that case, this starts to look like nothing more than a documentation change, e.g. id: "#{f.field_id(:image)}_shrine_cache" - I'd assumed more significance to the id with Shrine, but it doesn't care. So, I can just submit a PR for that.

jrochkind commented 2 years ago

Awesome, thank you!

To be clear:

which refers to a file in the Rails cache

The rails cache (Rails.cache) is not involved. Rather it's the shrine "cache" storage location. Shrine has a two-storage-location design, where files first go to a temporary "cache" location, then only later are "promoted" to the permanent "store" location, only after they've been validated, and actually had their metadata persisted to your persistent data object (ie ActiveRecord). (The files in the "cache" location have not necessarily been recorded in the db yet (although in some configurations some of them may be, temporarily!), the ones in the "store" location, unless there's a bug somewhere, are supposed to be guaranteed to be).

Both the "cache" and "store" names are IMO pretty easily confused/confusing and not ideal, but naming things is hard! ("Cache" because easily confused with other kinds of caches; "store" because it's a name for the role of one configured store/storage location, but that word also sounds like the generic name for the class of things!)