haskell-servant / servant

Servat is a Haskell DSL for describing, serving, querying, mocking, documenting web applications and more!
https://docs.servant.dev/
1.83k stars 414 forks source link

Cannot consume body of `Request` twice: not enough input #1120

Open chshersh opened 5 years ago

chshersh commented 5 years ago

I'm trying to implement custom authorization schemes. Looks like servant-auth doesn't support this feature at the moment (though, this probably will be implemented during GSoC 2019):

So I'm using Servant.API.Experimental.Auth. The problem with this approach is that it's not possible to consume body of Request twice (for calculating its hash-sum or just printing for debugging purposes) since it's an IO action which allows to consume the body only once. Consider the following minimal example:

#! /usr/bin/env cabal
{- cabal:
build-depends:
  , aeson
  , base ^>= 4.12
  , bytestring
  , servant ^>= 0.15
  , servant-server ^>= 0.15
  , wai >= 3.2.2
  , warp
-}

{-# LANGUAGE DataKinds                  #-}
{-# LANGUAGE DeriveGeneric              #-}
{-# LANGUAGE DerivingStrategies         #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE TypeApplications           #-}
{-# LANGUAGE TypeFamilies               #-}
{-# LANGUAGE TypeOperators              #-}

import Control.Monad.IO.Class (MonadIO (liftIO))
import Data.Aeson (FromJSON, ToJSON)
import Data.Proxy (Proxy (..))
import GHC.Generics (Generic)
import Network.Wai (Request)
import Network.Wai.Handler.Warp (run)
import Network.Wai.Internal (getRequestBodyChunk)
import Servant ((:>), Application, JSON, Post, ReqBody, Server, serveWithContext)
import Servant (Context ((:.), EmptyContext))
import Servant.API (AuthProtect)
import Servant.Server (Handler)
import Servant.Server.Experimental.Auth (AuthHandler, AuthServerData, mkAuthHandler)

import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8

newtype TheAnswer = TheAnswer Int
    deriving stock   (Show, Generic)
    deriving newtype (FromJSON, ToJSON)

type API =
       AuthAPI
    :> ReqBody '[JSON] TheAnswer
    :> Post '[JSON] TheAnswer

type AuthAPI = AuthProtect "TheAnswer"
type instance AuthServerData (AuthProtect "TheAnswer") = ()

server42 :: Server API
server42 = \() theAnswer -> pure theAnswer

authHandler :: AuthHandler Request ()
authHandler = mkAuthHandler handler
  where
    handler :: Request -> Handler ()
    handler req = liftIO $ do
        print req
        getWaiRequestBody req >>= BS8.putStrLn

getWaiRequestBody :: Request -> IO BS.ByteString
getWaiRequestBody request = BS.concat <$> getChunks
  where
    getChunks :: IO [BS.ByteString]
    getChunks = getRequestBodyChunk request >>= \chunk ->
        if chunk == BS.empty
        then pure []
        else (chunk:) <$> getChunks

app42 :: Application
app42 = serveWithContext
    (Proxy @API)
    (authHandler :. EmptyContext)
    server42

main :: IO ()
main = run 8080 app42

If I run this server and try to query it like this:

curl -H "Content-Type: application/json" -d '42' http://localhost:8080

I see

not enough input

However, if I comment the following line:

getWaiRequestBody req >>= BS8.putStrLn

Everything works without problems.

I wonder, whether it's possible to implement some workaround to make body of the request consumable more than once?

phadej commented 5 years ago

Consuming body twice is a bad idea. Every option I can think of are huge hacks

alpmestan commented 5 years ago

Indeed, we used to memoize the request body, but that causes all sorts of problems (you always retain it, you can't really do proper streaming request bodies). I second @phadej's suggestion. Perhaps a middleware could help?

chshersh commented 5 years ago

I understand that streaming is a useful feature. But a lot of services sign whole body of the request. This means that it's not possible to implement servant applications that can verify signatures. Showing the body of the request just for debugging purpose is also a useful feature. So in some cases it's okay to sacrifice streaming if you want something different and if you're ready to take full responsibility for this.

@alpmestan By mentioning middleware you mean this type of Middleware?

alpmestan commented 5 years ago

Yes. I'm not quite sure what would happen if you do something with the request body through the middleware and then try to use it in your servant app, like you normally would. But perhaps it's worth giving it a shot?

erewok commented 5 years ago

I have been thinking about this a bit because lately I have been dealing with many different auth schemas. For instance, with Oauth1, the protocol requires (paraphrasing) taking the request, sorting the url-form-encoded parameters, assembling it into a string, adding the secret token, percent-encoding the string, and then HMAC-SHA1 signing the whole thing. In other words, in order to implement or verify Oauth1, one needs access to the entire request.

The Oauth1 spec is pretty old, so doesn't mention JSON, but there are various API servers that offer a take on oauth1 that works along the lines of 1) taking the request body as a string, 2) appending a secret token to it (or prefixing it), and 3) signing the whole thing using SHA256 or something else. Similar to Oauth1, for a server that would like to first validate a request this variation requires consuming the request body entirely, and for a client, it would require adding a signature after the request has been otherwise assembled.

Lastly, Oauth2 doesn't require signing requests in this way, because it uses the request-grant flow, where clients request access from servers for protected resources and they receive temporary tokens to access those resources.

Specific to Haskell, for servers implementing either Oauth1 or the variation I mentioned, it seems like it should be possible to define a wai Middleware, for instance like this one in wai-extra that takes some authorizing function and consumes the request body in order to check a header signature matches. As mentioned above, you would have to consume the entire request body in order to validate and then possibly add a header to the request, finally packaging it up so that it can be consumed again by your serving Application?

The documentation for wai, says that to retrieve the requestBody, we should use getRequestBodyChunk, which "Returns 'B.empty' when the body is fully consumed." I think this is related to the fact that requests coming in are untrusted and may have huge files attached to them, so you could bog down servers that greedily consume all input in order to validate requests by sending them really huge files. As a result, wai seems to be offering a way to limit the resources consumed by a request-handling server.

Anyway, it seems like it should work as wai Middleware, but implementing it in Servant seems like it would bind servant-server even more tightly to wai?

Sidenote: for Servant clients, I don't really know how to sign requests before they're issued?

  1. https://tools.ietf.org/html/rfc5849#section-3.4.1.3
  2. https://developer.twitter.com/en/docs/basics/authentication/guides/creating-a-signature.html
  3. https://www.digitalocean.com/community/tutorials/an-introduction-to-oauth-2
phadej commented 5 years ago

I'd recommend using a Middleware which mangles the wai's request in the way, so servant can deal with something simpler.

Client part can be done similarly, in reverse.


An alternative is to write a variant of ReqBody combinator which acts as both ReqBody and auth. As the concerns are mixed, it's not unsurprising that servant cannot decouple them cleanly either. Yet, I don't think this combinator belongs to servant (is there common open service using an auth scheme @erewok described?)

chshersh commented 5 years ago

@erewok

Sidenote: for Servant clients, I don't really know how to sign requests before they're issued?

While working on servant-hmac-auth I've implemented custom client that signs all outgoing requests. You can see the code here:

@phadej @alpmestan I've tried to use Middleware instead of Auth but now curl command hangs 😞 What am I doing wrong? I can't derive from the documentation how can I use MVar here.

debugMiddleware :: Middleware
debugMiddleware app req handle = do
    body <- getWaiRequestBody req
    BS8.putStrLn body
    let newReq = req { requestBody = pure body }
    app newReq handle

main :: IO ()
main = run 8080 $ debugMiddleware app42
alpmestan commented 5 years ago

You mean you want your debugMiddleware to use an MVar?

debugMiddleware :: MVar -> Middleware
debugMiddleware = ...

main :: IO ()
main = newMVar foo >>= \mvar -> run 8080 $ debugMiddleware mvar app42

?

(sorry if I misunderstood, I skimmed through quickly)

chshersh commented 5 years ago

@alpmestan It's more about this commend by @phadej

  • new fresh requestBody (One have to use MVar here, as getRequestBodyChunk relies on some hidden state: socket's one in the real implementation)
alpmestan commented 5 years ago

Ah, not sure what @phadej meant here, I'll let him comment :-)

elfeck commented 4 years ago

I have a tangentially related question:

I've been looking for a way to really make sure the request body size is not too large, not trusting the Content-Length header field. As such I would like to consume the request body in a middleware and stop if the request turns out to be too large. If not, then "restore" the body and pass the request down to the app.

naglalakk commented 4 years ago

Hi I'm dealing with similar things to @chshersh and I'm just wondering if I understand this correctly. Is the reason the requestBody is always empty in my logging Middleware because servant has already consumed it? is there any way I could log the requestBody inside the Middleware?

saurabhnanda commented 4 years ago

Sorry for jumping in, but could this basically be related to how wai itself works? I had hit https://github.com/yesodweb/wai/issues/636 while trying to implement a logger that logged the complete request body. It seems that even wai-logger (in development mode), consumes the request body first, and then puts it back into the req for the next middleware in the chain to consume.

It didn't seem a very efficient strategy at that time. And it still doesn't.

fisx commented 4 years ago

It didn't seem a very efficient strategy at that time. And it still doesn't.

Well, I don't think there is a more efficient strategy: if you want your web server to handle an input stream twice sequentially, there isn't any way around keeping it for the second consumer.

So yes, it's less efficient, but I would say it's about as efficient as the non-streaming approach, where the full request body is stored in memory before passed to application code and/or middleware. It may still be perfectly sufficient.

tylerjl commented 3 years ago

I'm the epitome of a "no idea what I'm doing" Haskell author, but I ran into this recently (in writing Servant logic to handle Stripe webhook signatures) and came up with a working solution that implements the suggestions about MVar here and wanted to share in case was useful for someone else.

In case it isn't obvious: this probably isn't a good idea to use everywhere. It makes assumptions that all clients reading the body are going to behave. Also, I sort of guessed at using a Seq here since I opted to turn the list of chunks into a circular sequence and pop values off the top and then shove them back onto the end.

import qualified Data.Sequence as S
import Data.Sequence (Seq((:<|), Empty), (|>))

freezeReqBody :: Middleware
freezeReqBody app req handle' = do
  body' <- newMVar =<< extractBody req
  let req' = req { requestBody = requestBody' body' }
  app req' handle'
  where
    requestBody' mvar = modifyMVar mvar \chunks -> do
      case chunks of
        (h :<| rest) -> return (rest |> h, h)
        Empty -> return (Empty, mempty)

extractBody :: MonadIO m => Request -> m (Seq ByteString)
extractBody req = do
  chunk <- S.singleton <$> liftIO (getRequestBodyChunk req)
  (flip . iterateUntilM) (null . r) chunk \chunks' -> do
    chunk' <- liftIO $ getRequestBodyChunk req
    return $ chunks' |> chunk'
  where r seq' = case S.viewr seq' of
          S.EmptyR -> mempty
          (_ S.:> e) -> e

Again, strongly suggest that anyone who uses this only apply the middleware if it meets a predicate. From my own code:

ifRequest fromStripe freezeReqBody $ ...

where fromStripe (lookup "Stripe-Signature" . requestHeaders -> Just _) = True
      fromStripe _ = False

I haven't shored up the code to be more cautious about reading in a limited number of bytes, which would probably be a good idea in production, otherwise this code will DoS itself with a fat body. This middleware is more of a convenience if you need to look at the body a few times and don't have control over whoever is reading the body and thus can't "reload" it when body reads occur.

elldritch commented 2 years ago

I'm running into this scenario as well. My use case is verifying Slack event requests. I understand that reading the whole body and retaining it has performance and DoS implications, but there really isn't any other way to integrate against an API like this.

I would also be happy if someone could point me to a well-maintained middleware that does this sort of HMAC checking for me.

raduom commented 1 year ago

@erewok

Sidenote: for Servant clients, I don't really know how to sign requests before they're issued?

While working on servant-hmac-auth I've implemented custom client that signs all outgoing requests. You can see the code here:

@phadej @alpmestan I've tried to use Middleware instead of Auth but now curl command hangs 😞 What am I doing wrong? I can't derive from the documentation how can I use MVar here.

debugMiddleware :: Middleware
debugMiddleware app req handle = do
    body <- getWaiRequestBody req
    BS8.putStrLn body
    let newReq = req { requestBody = pure body }
    app newReq handle

main :: IO ()
main = run 8080 $ debugMiddleware app42

@chshersh I stumbled into this while trying to do pretty much the same thing that you are (an HMAC signature of the payload) and I think I figured out why (a) your application hangs and (b) why you need an MVar.

The application does not actually hang, but it gets stuck in reading and re-reading the body that you provided. If you check the documentation, you will see that the contract for the requestBody field is that after the body has been fully consumed it should return empty. However, your implementation never returns empty and as such it keeps getting called in an infinite cycle.

You can observe the behavior by using this requestBody: requestBody = putStrLn "reading" >> pure body. You should get a lot of "reading" as output.

Because you must ensure that empty gets returned after the contents have been consumed you need to save the state of whether the contents have been consumed or not, which is why you need to use an MVar.

A solution to the problem might be to implement a 'chunked' reading of the body, while using the update functions provided by cryptonite to update an HMAC value with new data. However, this is also a bit problematic as you don't actually know if the handler will perform any side-effects before you get a chance to consume the full body and verify the HMAC signature (it might work if the ReqBody argument is used in your handler and if servant fully reads the request body to provide that argument before calling your handler). This solution should keep the stream, streaming though.

The alternative is what I suggested previously, which is reading everything into memory, setup an MVar to check if you already consumed the body and if you did return empty.