Open erikinfo opened 3 weeks ago
Thanks, this is useful to know. We'd like to fix it. For clarity, I assume this is about huggingface.InferenceEndpoint
and not one of the other InferenceEndpoint
classes in garak - please correct this if needed.
@erikinfo Do you know how we can get a sample endpoint for testing & validation?
@jmartin-tech I think an approach similar to how payload
is interpreted in nvcf.NvcfChat
& .NvcfCompletion
could work well here, instead of a custom generator. What do you think?
Yep,huggingface.InferenceEndpoint
.
Unfortunately, I think there is no way besides hosting one yourself. I could also try and help with testing. If you have an AWS account you can test the creation of an valid request object using this method:
import requests
import json
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
from botocore.credentials import RefreshableCredentials
from botocore.session import get_session
# Initialize a Boto3 session
session = boto3.Session()
# Retrieve AWS credentials
credentials = session.get_credentials().get_frozen_credentials()
# Define your AWS region and service
region = 'us-east-1'
service = 'sagemaker'
# Define the endpoint URL
endpoint_url = 'https://<your-endpoint>.us-east-1.amazonaws.com/endpoints/<your-endpoint-name>/invocations'
# Define the headers
headers = {
'Content-Type': 'application/json'
}
# Define the payload
payload = {
"key1": "value1",
"key2": "value2"
# Add other payload data as needed
}
# Create an AWSRequest
request = AWSRequest(method='POST', url=endpoint_url, data=json.dumps(payload), headers=headers)
# Sign the request using SigV4Auth
SigV4Auth(credentials, service, region).add_auth(request)
print(request.method) # POST
print(request.url) # URL e.g. https://<your-endpoint>.us-east-1.amazonaws.com/endpoints/<your-endpoint-name>/invocations
print(dict(request.headers))
print(request.body) # data
Please note that this method should just be an helper to pin point to the headers attributes that are needed.
The headers attribute should therefore be updated in the huggingface.InferenceEndpoint. Please allow the following header attributes to be added:
I think the payload
pattern in NVCF is a reasonable however I think there may be an option to not have to handle the raw request by providing something like a _send_request
method to keep the payload
more huggingface
specific and utilize AWS provided wrappers for requests. Might look something like this (obviously some syntax is not exact):
class AWSInferenceEndpoint(InferenceEndpoint):
supports_multiple_generations = False
def __init__(self, name="", generations=10, config_root=_config):
super().__init__(name, generations=generations, config_root=config_root)
# gather AWS details here and set on `self` if not provided by `_config`
def _send_request(self, payload):
request = AWSRequest(method=self.method, url=self.uri, data=json.dumps(payload), headers=headers)
SigV4Auth(self.aws_credentials, self.aws_service, self.aws_region).add_auth(request)
return request.post()
@erikinfo Thanks tons for this detailed example, it should be really helpful. Also thanks for volunteering to help test. I hope we can get started using this guide, https://huggingface.co/docs/sagemaker/inference
@jmartin-tech This could work. Two separate generator classes for one named product (https://huggingface.co/docs/inference-endpoints/index) seems a little unintuitive to me, but if it reduced tech debt in exchange for a reasonably-sized dependency, that's a win. Maybe something for or closely after the generators.huggingface
refactor?
The Authorization request header is necessary to establish a connection the the endpoint hosted on AWS Sagemaker.
The InferenceEndpoint inside huggingface.py should therefore have a new field consisting of auth header sent with the request, similiar to how API keys were treated using environment variables.