hugapi / hug

Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.
MIT License
6.86k stars 388 forks source link

Falcon Request Object #501

Open ulmentflam opened 7 years ago

ulmentflam commented 7 years ago

So, I ran into an interesting problem with the Hug API regarding the request body passed into the parameters of a hug post request. I was implementing a web hook for Stripe on my API and for the purpose of validating the request I needed to check the signature of Stripe's request to my api using a secret key and HMAC SHA256. In order to correctly check the signature I needed the raw request body from Falcon's Request object. The reason I could not use the body passed through the request parameters by the 'application/json' input formatter is because it is retuned as a list due to the json loads preformed on the json body pulled from Falcon's Request object. The Falcon Request object has some interesting properties that I am going to discuss in brief on this issue. The json body of the request is stored in the request object and is retrieved by calling request.stream.read(). The request body is streamed as bytes from the request object and this can only be done once. Once the Falcon Request object has been read from the stream, it cannot be read again. Therefor this stream object is not seekable and you cannot return to the beginning of the byte stream (This is because it uses the input directly from wsgi). These attributes of the Falcon Request Object create an interesting problem with the hug api. The following code snippets show the problem:

@content_type('text/plain')
def text(body, charset='utf-8', **kwargs):
    """Takes plain text data"""
    return body.read().decode(charset)

In the above code body.read() is the one time accesses of Falcon's Request object. Once the body is read here it cannot be accessed again in the request object in the hug.post's input parameters. If stripe sent me a request with content_type 'text/plain' it would have been simple to take in the request body and use it to compare signatures, however the stripe api sends me a content type of 'application/json'.

@content_type('application/json')
def json(body, charset='utf-8', **kwargs):
    """Takes JSON formatted data, converting it into native Python objects"""
    return json_converter.loads(text(body, charset=charset))

With the content_type of 'application/json' the string result of the hug text function is passed into a json.loads and it is then processed in the gather parameters interface. When doing signature validation with the HMAC SHA256 algorithm a json.dumps of a json.loads does not produces the same signature as the json.dumps or the decode of the body as bytes. The body formatting is mainly implemented in the following function.

    def gather_parameters(self, request, response, api_version=None, **input_parameters):
        """Gathers and returns all parameters that will be used for this endpoint"""
        input_parameters.update(request.params)
        if self.parse_body and request.content_length:
            body = request.stream
            content_type, content_params = parse_content_type(request.content_type)
            body_formatter = body and self.api.http.input_format(content_type)
            if body_formatter:
                body = body_formatter(body, **content_params)
            if 'body' in self.parameters:
                input_parameters['body'] = body
            if isinstance(body, dict):
                input_parameters.update(body)
        elif 'body' in self.parameters:
            input_parameters['body'] = None

        if 'request' in self.parameters:
            input_parameters['request'] = request
        if 'response' in self.parameters:
            input_parameters['response'] = response
        if 'api_version' in self.parameters:
            input_parameters['api_version'] = api_version
        for parameter, directive in self.directives.items():
            arguments = (self.defaults[parameter], ) if parameter in self.defaults else ()
            input_parameters[parameter] = directive(*arguments, response=response, request=request,
                                                    api=self.api, api_version=api_version, interface=self)
        return input_parameters

So, If I was going to get the raw body I would need to somehow intercept the falcon request object replace the one time request.stream object with an object that is seekable and can be read to be pulled from the above request parameter and the text function connected to the @content_type('text/plain') decorator. One of the suggested methods of doing this was to implement a middleware class that preprocesses the request object before it enters these functions. The preprocessor would stream the request object and replace that object with a BytesIO object that is seekable. I implemented it as follows:

request.stream = io.BytesIO(request.stream.read())

This made it possible for the raw request data to be accessed in both the request object coming into my function and the request.stream and the body.read() called in the gather_parameters function. My dilemma with doing this is that I do not know the overall effect it would have on hug nor the effect it would have on my api, so I created a new solution. The solution I created was limited only to the requests that need to use the stripe signature validation. I created a function that is decorated by an authenticator that does the above process with the request object passed into the authenticator that occurs earlier in hug's request processing. The solution I created is perfectly acceptable as a temporary fix, however I would like to propose a more permeant solution.

If it is possiable I would like to implement a 'raw_body' parameter in the gather_parameters function that returns the streamed bytes. This will obviously effect the functions in input_format.py however it would allow the raw bytes from Falcon's Request object to be accessed in a hug function if others need to implement a hmac sha256 algorithm with a json request body. If anyone has any thoughts on my issue or better solution, let me know. Otherwise I think it may be nice to be able to retrieve the raw request body within the input parameters.

marcellodesales commented 6 years ago

@ulmentflam Implementing a Github Webhook handler and I need to same signature validation... Did you get the proposed solution somewhere? I'm on my first 1h with Hug and can't believe this is missing...

ulmentflam commented 6 years ago

@marcellodesales So the below is my hacky solution to this problem that works, but the prefered solution is the solution I proposed above. I implemented my authenticator for the webhook as follows:

@authenticator
def webhook_signature(request, response, verify_function, **kwargs):
    '''Custom authenticator for webhook signatures'''
    signature = request.get_header('SIGNATURE')
    if signature:
        # This code is fairly important, it makes the stream object a bytes object for this request
        # Then seeks zero to reset the object for hug's json processor
        request.stream = io.BytesIO(request.stream.read(request.content_length or 0))
        read_object = request.stream.read()
        request.stream.seek(0)
        verified_signature = verify_function(signature, read_object.decode())
        if verified_signature:
            return verified_signature
        else:
            return False
    return False
cmin764 commented 5 years ago

You can handle compression with something like this:

class Compression(object):

    """Handles (de)compression for I/O."""

    # Compression detection/set headers.
    ACCEPT_ENCODING = "Accept-Encoding"
    CONTENT_ENCODING = "Content-Encoding"

    # Recognized compressions traces.
    GZIP = "gzip"
    RE_GZIP = re.compile(fr"{GZIP}", re.IGNORECASE)
    DECODE_FUNCTIONS = None
    ENCODE_FUNCTIONS = None

    @staticmethod
    def _decompress(stream):
        """Replaces the current `stream` with a decompressed one."""
        return gzip.open(stream)

    @staticmethod
    def _compress(data):
        """Replaces the received `data` with a compressed form."""
        return gzip.compress(data)

    @classmethod
    def _decode_functions(cls):
        if not cls.DECODE_FUNCTIONS:
            cls.DECODE_FUNCTIONS = {
                cls.RE_GZIP: cls._decompress,
            }
        return cls.DECODE_FUNCTIONS

    @classmethod
    def _encode_functions(cls):
        if not cls.ENCODE_FUNCTIONS:
            cls.ENCODE_FUNCTIONS = {
                cls.RE_GZIP: cls._compress,
            }
        return cls.ENCODE_FUNCTIONS

    @classmethod
    def transform_request(cls, request):
        """Decodes input based on content encoding type and binds it to the
        `request` itself.
        """
        # Retrieve the appropriate content encoding if provided.
        content_encoding = (request.get_header(cls.CONTENT_ENCODING) or
                            "").strip()
        if not content_encoding:
            return

        # Find how to decompress it.
        for regex, decoder in cls._decode_functions().items():
            if regex.search(content_encoding):
                # Found the compression method.
                request.stream = decoder(request.bounded_stream)
                return

    @classmethod
    def transform_response(cls, data, request, response):
        """Compress and return provided `data` if the requester is accepting
        it and set appropriate response headers.
        """
        # Check if the client accepts compression.
        accept_encoding = (request.get_header(cls.ACCEPT_ENCODING) or
                           "").strip()
        if not accept_encoding:
            # Return data as it is and response untouched.
            return data

        # Try to compress it using the appropriate method.
        for regex, encoder in cls._encode_functions().items():
            if regex.search(accept_encoding):
                # Before compressing data, set related headers.
                response.set_header(cls.CONTENT_ENCODING, cls.GZIP)
                return encoder(data)

        # No suitable method for compression was found, so return the data as
        # received.
        return data

# Then set I/O formatters with suitable mechanisms.

@hug.request_middleware()
def decompress_request(request, _):
    """Decompress data before parsing if is compressed only."""
    Compression.transform_request(request)

@hug.default_output_format()
def compress_response(data, request, response):
    """Compress response data before returning if supported."""
    # Tell the client that we accept compression, no matter what.
    response.set_header(Compression.ACCEPT_ENCODING,
                        Compression.GZIP)
    # Make it JSON first.
    output_json = (hug.output_format.pretty_json if DEBUG else
                   hug.output_format.json)
    json_data = output_json(data)
    # Now compress it if the caller accepts it (and adjust content type
    # header).
    compressed_data = Compression.transform_response(
        json_data,
        request,
        response
    )
    return compressed_data
egibert commented 2 years ago

None of the above worked for me so here is my solution:

@hug.object.post('/webhooks')
def handle_webhook(self, request, bod):
        signature = request.headers["X-HUB-SIGNATURE"]

       # We need to tell JSON to not add spaces between the separators
        payload = bytes(json.dumps(body, separators=(',', ':')), 'utf-8')
        encoded_secret = secret.encode()

        # contruct hmac generator with our secret as key, and SHA-1 as the hashing function
        hmac_gen = hmac.new(encoded_secret, payload, hashlib.sha1)

        # create the hex digest and append prefix to match the GitHub request format
        digest = "sha1=" + hmac_gen.hexdigest()

        print(hmac.compare_digest(digest, signature))