InferenceEndpoint Generator fails for AWS because of missing Authorization header

erikinfo commented 3 weeks ago

The Authorization request header is necessary to establish a connection the the endpoint hosted on AWS Sagemaker.

The InferenceEndpoint inside huggingface.py should therefore have a new field consisting of auth header sent with the request, similiar to how API keys were treated using environment variables.

leondz commented 3 weeks ago

Thanks, this is useful to know. We'd like to fix it. For clarity, I assume this is about huggingface.InferenceEndpoint and not one of the other InferenceEndpoint classes in garak - please correct this if needed.

@erikinfo Do you know how we can get a sample endpoint for testing & validation?

@jmartin-tech I think an approach similar to how payload is interpreted in nvcf.NvcfChat & .NvcfCompletion could work well here, instead of a custom generator. What do you think?

erikinfo commented 3 weeks ago

Yep,huggingface.InferenceEndpoint.

Unfortunately, I think there is no way besides hosting one yourself. I could also try and help with testing. If you have an AWS account you can test the creation of an valid request object using this method:

import requests
import json
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
from botocore.credentials import RefreshableCredentials
from botocore.session import get_session

# Initialize a Boto3 session
session = boto3.Session()

# Retrieve AWS credentials
credentials = session.get_credentials().get_frozen_credentials()

# Define your AWS region and service
region = 'us-east-1'
service = 'sagemaker'

# Define the endpoint URL
endpoint_url = 'https://<your-endpoint>.us-east-1.amazonaws.com/endpoints/<your-endpoint-name>/invocations'

# Define the headers
headers = {
    'Content-Type': 'application/json'
}

# Define the payload
payload = {
    "key1": "value1",
    "key2": "value2"
    # Add other payload data as needed
}

# Create an AWSRequest
request = AWSRequest(method='POST', url=endpoint_url, data=json.dumps(payload), headers=headers)

# Sign the request using SigV4Auth
SigV4Auth(credentials, service, region).add_auth(request)

print(request.method) # POST
print(request.url) # URL e.g. https://<your-endpoint>.us-east-1.amazonaws.com/endpoints/<your-endpoint-name>/invocations
print(dict(request.headers))
print(request.body) # data

Please note that this method should just be an helper to pin point to the headers attributes that are needed.

The headers attribute should therefore be updated in the huggingface.InferenceEndpoint. Please allow the following header attributes to be added:

'X-Amz-Date' is the current date in format: t = datetime.datetime.utcnow() amzdate = t.strftime('%Y%m%dT%H%M%SZ')
'X-Amz-Security-Token'
'Authorization' (a non-Bearer, AWS token)

jmartin-tech commented 3 weeks ago

I think the payload pattern in NVCF is a reasonable however I think there may be an option to not have to handle the raw request by providing something like a _send_request method to keep the payload more huggingface specific and utilize AWS provided wrappers for requests. Might look something like this (obviously some syntax is not exact):

class AWSInferenceEndpoint(InferenceEndpoint):

    supports_multiple_generations = False

    def __init__(self, name="", generations=10, config_root=_config):
        super().__init__(name, generations=generations, config_root=config_root)
        # gather AWS details here and set on `self` if not provided by `_config`

    def _send_request(self, payload):
        request = AWSRequest(method=self.method, url=self.uri, data=json.dumps(payload), headers=headers)
        SigV4Auth(self.aws_credentials, self.aws_service, self.aws_region).add_auth(request)
        return request.post()

leondz commented 3 weeks ago

@erikinfo Thanks tons for this detailed example, it should be really helpful. Also thanks for volunteering to help test. I hope we can get started using this guide, https://huggingface.co/docs/sagemaker/inference

@jmartin-tech This could work. Two separate generator classes for one named product (https://huggingface.co/docs/inference-endpoints/index) seems a little unintuitive to me, but if it reduced tech debt in exchange for a reasonably-sized dependency, that's a win. Maybe something for or closely after the generators.huggingface refactor?

leondz / garak

InferenceEndpoint Generator fails for AWS because of missing Authorization header #722