ring-clojure / ring-codec

Utility library for encoding and decoding data
MIT License
63 stars 30 forks source link

form-decode should accept an InputStream #6

Closed ToBeReplaced closed 10 years ago

ToBeReplaced commented 10 years ago

The :body of a ring request is an InputStream. Right now, to decode a urlencoded form, you must redundantly specify the encoding. I would like to remove that requirement and decode the message body directly.

The below examples assume you have populated the :character-encoding on the request. ring-servlet/build-request-map does this for the most common case.

;;; How I use form-decode today
(let [encoding (:character-encoding request)]
  (form-decode (slurp (:body request) :encoding encoding) encoding))

;;; How I would like to use form-decode
(form-decode (:body request) (:character-encoding request))
weavejester commented 10 years ago

Is there a reason you can't use the normal Ring params middleware for this?

ToBeReplaced commented 10 years ago

No blockers, but here's why I'm interested in using form-decode directly: a) It will throw if the body isn't urlencoded instead of checking for safety and passing forward as with wrap-params or assoc-form-params. b) I want the form data back at the point where I assert I am receiving a urlencoded form; I have no desire to put them into the request to pull them back out again. c) I don't want to bring in all of ring-core and its dependencies; Although I'm sure they are all harmless, I don't want to have to do a security assessment on the whole codebase.

Ultimately, I don't like the middleware approach for web programming in the large. I'm trying to help make sure web programming in clojure isn't restricted to that approach.

weavejester commented 10 years ago

The form-decode function doesn't throw an exception if it can't parse a string, it just returns nil.

I'm currently on the fence about this change... I think it should be a separate function, rather than overloading form-decode, but I'm not sure what to call it. form-decode-stream would be the obvious choice, but we have form-decode-str where the str refers to the type of the return value rather than the argument, so there's a consistency problem.

Incidentally, if you're going to be avoiding middleware, you'll find yourself reinventing the wheel a lot.

ToBeReplaced commented 10 years ago

Ah, you're right, it doesn't throw an exception. That doesn't seem right; Do you think that belongs here? It looks like it should really be in the middleware where it's appropriate to enforce control flow.

Why do you think it should be a separate function? I would have thought the natural way would be to move form-decode to a protocol.

What sparks your comment about wheel reinvention? The concerns I have are more that there is a lot of useful work that is currently coupled to middleware that could be exposed separately. Consider all of the work the pedestal folks required in order to build interceptors out of existing middleware. I'd like to see more libraries like ring-codec sitting outside of the middleware paradigm for better reuse.

weavejester commented 10 years ago

I couldn't think of any scenario where throwing an exception would be useful. It would also have been more work, as I'd need to write my own parsing exceptions for form-decode. If I wanted good exceptions, I'd probably have to pull in a parser, rather than using clojure.string/split. Returning a nil value seemed simpler and relevant to the use-case I wanted at the time.

As for why I'd want it to be a separate function, it's because reading from a stream is side-effectful, while reading from a string is not. Having a polymorphic function that's pure for some types, and side-effectful for others, seems incorrect.

Regarding my comment about reinventing the wheel, I simply meant that a lot of existing functionality is written as middleware, so you'd need to rewrite it if you didn't want to use it. That said, I don't think it's a bad idea to pull functionality out of middleware if it could be useful elsewhere.

However... Middleware has relatively few disadvantages, and is a relatively good fit for a request/response protocol like HTTP. If you're trying to go the full async route you might run into problems, but I'm not convinced that's worth the trouble, not for HTTP anyway.

ToBeReplaced commented 10 years ago

Good point about the side effects, and I agree with your conclusion; overloading is incorrect.

I'm indifferent on my original request now. I'm also not sure that the two encodings mean the same thing. The charset of the request body dictates the encoding for the slurp, but then the decode encoding is the choice of the server / strongly recommended to be utf-8?

I think a CSP based approach to HTTP may be interesting. I agree, there are frictions to get there today, but I'd like to see more people experiment.

weavejester commented 10 years ago

The encodings are indeed different. The stream encoding just needs to be a superset of ASCII (such as UTF-8 or ISO-8859-1), because percent-encoding only uses ASCII characters.

The encoding of the actual form can be anything, and because it's an old standard, there's no mechanism (in general use) for explicitly specifying the character encoding. Usually it's the same as the page submitting the form, so UTF-8 is a good default to use.

ToBeReplaced commented 10 years ago

It seems then that my initial request doesn't make sense because you shouldn't use the request's character encoding for both operations.

Happy to close, thanks for the time and thoughts! I should have spent more time understanding why the decode operation required an encoding in the first place.

On that note, maybe the decode encoding and the slurp encoding should be parametrized separately here: https://github.com/ring-clojure/ring/blob/master/ring-core/src/ring/middleware/params.clj#L27 .

weavejester commented 10 years ago

Yeah, that looks wrong. I'll mark it down as to be fixed. Thanks for pointing it out!