Closed hlo-world closed 1 year ago
https://github.com/huggingface/text-generation-inference/pull/514 should make requiring TRUST_REMOTE_CODE not necessary anymore.
i have issue with tiiuae/falcon-rw-1b
on sagemaker==2.170.0
, and get the following error. Did i miss something?
ValueError: Loading /opt/ml/model requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
Code to reproduce.
import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
from huggingface_hub import snapshot_download
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
# set to default bucket if a bucket name is not given
sagemaker_session_bucket = sess.default_bucket()
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sess.boto_region_name}")
MODEL_ID = "tiiuae/falcon-rw-1b"
CACHED_DIR = "../cache"
MERGE_MODEL_DIR = "merged_model_test"
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
# device_map="auto",
trust_remote_code=True,
cache_dir=CACHED_DIR,
)
tokenizer = AutoTokenizer.from_pretrained(
MODEL_ID,
cache_dir=CACHED_DIR,
)
model.save_pretrained(MERGE_MODEL_DIR, safe_serialization=True)
tokenizer.save_pretrained(MERGE_MODEL_DIR, safe_serialization=True)
import os
parent_dir = os.getcwd()
# change to model dir
os.chdir(MERGE_MODEL_DIR)
# use pigz for faster and parallel compression
!tar -cf model.tar.gz --use-compress-program=pigz *
# change back to parent dir
os.chdir(parent_dir)
from sagemaker.s3 import S3Uploader
# upload model.tar.gz to s3
s3_model_uri = S3Uploader.upload(local_path=str(MERGE_MODEL_DIR + "/model.tar.gz"), desired_s3_uri=f"s3://{sess.default_bucket()}/test-model")
print(f"model uploaded to: {s3_model_uri}")
from sagemaker.huggingface import get_huggingface_llm_image_uri, HuggingFaceModel
import json
image_uri = get_huggingface_llm_image_uri("huggingface", version="0.8.2")
print(f"llm image uri: {image_uri}")
instance_type = "ml.g4dn.2xlarge"
health_check_timeout = 300
trust_remote_code = True
config = {
"HF_MODEL_ID": "/opt/ml/model", # path to where sagemaker stores the model
"MAX_INPUT_LENGTH": json.dumps(2048), # Max length of input text
"MAX_TOTAL_TOKENS": json.dumps(3000), # Max length of the generation (including input text)
"HF_MODEL_TRUST_REMOTE_CODE": json.dumps(trust_remote_code)
}
# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
role=role,
image_uri=image_uri,
model_data=s3_model_uri,
env=config,
)
endpoint_name = sagemaker.utils.name_from_base("test")
predictor = llm_model.deploy(
endpoint_name=endpoint_name,
initial_instance_count=1,
instance_type=instance_type,
model_data_download_timeout=10 * 60,
container_startup_health_check_timeout=10 * 60,
wait=False,
)
print(predictor.endpoint_name)
hi, I had the same problem. Version 0.8.2 does not have HF_MODEL_TRUST_REMOTE_CODE in entrypoint script.
https://github.com/huggingface/text-generation-inference/blob/v0.8.2/sagemaker-entrypoint.sh
A workaround is to set TRUST_REMOTE_CODE=true directly (text-generation-laucher will pick this environment variable).
https://github.com/huggingface/text-generation-inference/blob/v0.8.2/launcher/src/main.rs#L74-L78
config = {
(other stuff...)
'TRUST_REMOTE_CODE': json.dumps(True),
}
The other option is to use the latest TGI and remove TRUST_REMOTE_CODE.
SageMaker SDK 2.171 still has version 0.8.2.
The latest release in this repo is 0.9.1.
In my case, I pushed image 0.9.1 to my account ECR (I didn't try to pull from github directly to create my endpoint).
Something like:
docker pull --platform=linux/amd64 ghcr.io/huggingface/text-generation-inference:0.9.1
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
docker build \
--build-arg TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:0.9.1 \
-t ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1 \
-f Dockerfile \
.
docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1
Docker build is required to change ENTRYPOINT and CMD from the published image.
https://github.com/huggingface/text-generation-inference/blob/v0.9.1/Dockerfile#L179-L185
ARG TGI_IMAGE
FROM --platform=linux/amd64 ${TGI_IMAGE}
COPY sagemaker-entrypoint.sh entrypoint.sh
RUN chmod +x entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]
CMD [ "" ]
then, the deploy script changed to:
llm_model = HuggingFaceModel(
role=role,
image_uri="ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1",
# ...
)
Tip: 0.9.1 change HUGGINGFACE_HUB_CACHE to "/data" and something goes wrong... so I changed back to "/tmp" (as in 0.8.2).
With this setup, "TRUST_REMOTE_CODE" is not required to run Falcon or MPT as @Narsil said.
I tested with Falcon 40B Instruct (2 configs DTYPE=b-float16 and HF_MODEL_QUANTIZE=bitsandbytes), MPT 30B Instruct (2 configs DTYPE=b-float16 and HF_MODEL_QUANTIZE=bitsandbytes). Each config in a single ml.g5.12xlarge. I had to add MAX_BATCH_PREFILL_TOKENS and MAX_BATCH_TOTAL_TOKENS to avoid memory issues.
{
"HF_MODEL_ID": "tiiuae/falcon-40b-instruct",
"DTYPE": "b-float16",
"MAX_INPUT_LENGTH": "1024",
"MAX_TOTAL_TOKENS": "2048",
"MAX_BATCH_PREFILL_TOKENS": "2048",
"MAX_BATCH_TOTAL_TOKENS": "8000",
"HUGGINGFACE_HUB_CACHE": "/tmp",
"SM_NUM_GPUS": "4"
}
{
"HF_MODEL_ID": "tiiuae/falcon-40b-instruct",
"HF_MODEL_QUANTIZE": "bitsandbytes",
"MAX_INPUT_LENGTH": "1024",
"MAX_TOTAL_TOKENS": "2048",
"MAX_BATCH_PREFILL_TOKENS": "2048",
"MAX_BATCH_TOTAL_TOKENS": "8000",
"HUGGINGFACE_HUB_CACHE": "/tmp",
"SM_NUM_GPUS": "4"
}
{
"HF_MODEL_ID": "mosaicml/mpt-30b-instruct",
"DTYPE": "b-float16",
"MAX_INPUT_LENGTH": "1024",
"MAX_TOTAL_TOKENS": "2048",
"MAX_BATCH_PREFILL_TOKENS": "2048",
"MAX_BATCH_TOTAL_TOKENS": "8000",
"HUGGINGFACE_HUB_CACHE": "/tmp",
"SM_NUM_GPUS": "4"
}
{
"HF_MODEL_ID": "mosaicml/mpt-30b-instruct",
"HF_MODEL_QUANTIZE": "bitsandbytes",
"MAX_INPUT_LENGTH": "1024",
"MAX_TOTAL_TOKENS": "2048",
"MAX_BATCH_PREFILL_TOKENS": "2048",
"MAX_BATCH_TOTAL_TOKENS": "8000",
"HUGGINGFACE_HUB_CACHE": "/tmp",
"SM_NUM_GPUS": "4"
}
I also tested with model.tar.gz for both Falcon 63GB and MPT 45GB. I had to add model_data_download_timeout
to avoid SageMaker failing to create endpoint with Falcon.
llm_model.deploy(
# ...
container_startup_health_check_timeout=3600,
model_data_download_timeout=3600,
)
I am still evaluating this setup, so take it as is, I don't know if it is enough or even good (One thing I can tell: it's better than Falcon from SageMaker JumpStart at glance, too slow).
@yapweiyih I am guessing, what files do you have in MERGE_MODEL_DIR? are they all local files or any of them is a symlink to a file in the cache?
I had problems with symlinks and had to add --dereference
to follow symlinks (GNU Tar default is to copy the link).
Also, let me share the way I did to convert the model:
mkdir -p data
docker run --rm -it \
--volume $PWD/data:/data \
--entrypoint "" \
--env HF_HUB_ENABLE_HF_TRANSFER=0 \
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1 \
text-generation-server download-weights tiiuae/falcon-40b-instruct
conda create -y -q --force \
-c conda-forge \
-n generative-ai-tgi \
python=3.11 \
huggingface_hub \
sagemaker-python-sdk \
tar \
pigz
conda run -n generative-ai-tgi --live-stream \
python create_sagemaker_model.py tiiuae/falcon-40b-instruct
create_sagemaker_model.py
(s3_model_store_uri
must be set to your S3 URI)
import os
import shutil
import subprocess
import sys
from datetime import datetime
from huggingface_hub import snapshot_download
from sagemaker.s3 import S3Downloader, S3Uploader
# Arguments
if len(sys.argv) < 2:
print(
"usage: create_sagemaker_model.py <repo_id> [TGI_data_dir] (default [TGI_data_dir] is ./data)"
)
sys.exit(-1)
# repo_id = "tiiuae/falcon-40b-instruct"
# repo_id = "mosaicml/mpt-30b-instruct"
repo_id = sys.argv[1]
tgi_data_dir = "data" if len(sys.argv) <= 2 else sys.argv[2]
s3_model_store_uri = "s3://<BUCKET>/<PREFIX>"
overwrite = False
# TGI Download Weights -> Converting PyTorch weights to safetensors.
model_name = repo_id.replace("/", "--")
tgi_model_dir = os.path.join(tgi_data_dir, f"models--{model_name}")
if not os.path.isdir(tgi_model_dir):
raise Exception(f"TGI Model dir not found: {tgi_model_dir}")
tgi_model_revision = os.path.join(tgi_model_dir, "refs", "main")
if not os.path.isfile(tgi_model_revision):
raise Exception(f"TGI Model revision not found: {tgi_model_revision}")
with open(tgi_model_revision, "r", encoding="ascii") as revision_file:
model_revision = revision_file.read().strip()
tgi_model_data_dir = os.path.join(tgi_model_dir, "snapshots", model_revision)
if not os.path.isdir(tgi_model_data_dir):
raise Exception(f"TGI Model data dir not found: {tgi_model_data_dir}")
tgi_model_safetensors = [
os.path.join(tgi_model_data_dir, filename)
for filename in os.listdir(tgi_model_data_dir)
if filename.endswith(".safetensors")
]
if not tgi_model_safetensors:
raise Exception(f"TGI Model safe tensors not found: {tgi_model_data_dir}")
# Download model files (skip weights)
model_data_dir = os.path.join("models", model_name, model_revision, "data")
print()
print(f"Downloading model data to {model_data_dir}")
print()
start_time = datetime.now()
if overwrite and os.path.isdir(model_data_dir):
print("(model data found. Overwriting...)")
shutil.rmtree(model_data_dir)
if not os.path.isdir(model_data_dir):
os.makedirs(model_data_dir)
snapshot_download(
repo_id,
revision=model_revision,
local_dir=model_data_dir,
local_dir_use_symlinks=False,
resume_download=True,
ignore_patterns=["*.msgpack*", "*.h5*", "*.bin*"],
)
for path in sorted(tgi_model_safetensors):
src_name = os.path.basename(path)
src_path = os.path.abspath(path)
dst_path = os.path.join(model_data_dir, src_name)
os.symlink(src=src_path, dst=dst_path)
end_time = datetime.now()
elapsed_time = end_time - start_time
print()
print(f"Elapsed : {elapsed_time}")
print()
# Create model package (include safetensors weights)
model_tgz = os.path.join("models", model_name, model_revision, "model.tar.gz")
print()
print(f"Creating model package {model_tgz}")
print()
start_time = datetime.now()
if overwrite and os.path.isfile(model_tgz):
print("(model package found. Overwriting...)")
os.remove(model_tgz)
if not os.path.isfile(model_tgz):
packaging_cmd = [
"tar",
"-cf",
model_tgz,
"--dereference", # follow symlinks
"--use-compress-program=pigz",
"-C",
model_data_dir,
".",
]
subprocess.run(packaging_cmd, env={"GZIP": "-1"}, check=True)
print(f"tar.gz size: {os.path.getsize(model_tgz):,d} bytes")
end_time = datetime.now()
elapsed_time = end_time - start_time
print()
print(f"Elapsed : {elapsed_time}")
print()
# Upload model.tar.gz to S3
model_prefix_uri = f"{s3_model_store_uri}/{model_name}/{model_revision}"
model_tgz_uri = f"{model_prefix_uri}/model.tar.gz"
print()
print(f"Uploading model package to {model_tgz_uri}")
print()
start_time = datetime.now()
s3_model_files = S3Downloader.list(s3_uri=model_prefix_uri)
if overwrite or model_tgz_uri not in s3_model_files:
print("(uploading...)")
s3_model_uri = S3Uploader.upload(local_path=model_tgz, desired_s3_uri=model_prefix_uri)
if s3_model_uri != model_tgz_uri:
print("WAT?")
print(s3_model_uri)
end_time = datetime.now()
elapsed_time = end_time - start_time
print()
print(f"Elapsed : {elapsed_time}")
print()
print("Done!")
The expected output for Falcon 40B Instruct.
models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/data/
$ find . | sort
.
./config.json
./configuration_RW.py
./generation_config.json
./.gitattributes
./handler.py
./model-00001-of-00009.safetensors
./model-00002-of-00009.safetensors
./model-00003-of-00009.safetensors
./model-00004-of-00009.safetensors
./model-00005-of-00009.safetensors
./model-00006-of-00009.safetensors
./model-00007-of-00009.safetensors
./model-00008-of-00009.safetensors
./model-00009-of-00009.safetensors
./modelling_RW.py
./README.md
./special_tokens_map.json
./tokenizer_config.json
./tokenizer.json
models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/model.tar.gz
$ ls -alh model.tar.gz
-rw-rw-r-- 1 ec2-user ec2-user 63G Jul 6 20:19 model.tar.gz
The expected output for MPT 30B Instruct.
models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/data/
$ find . | sort
.
./adapt_tokenizer.py
./attention.py
./blocks.py
./config.json
./configuration_mpt.py
./custom_embedding.py
./flash_attn_triton.py
./generation_config.json
./.gitattributes
./hf_prefixlm_converter.py
./meta_init_context.py
./model-00001-of-00007.safetensors
./model-00002-of-00007.safetensors
./model-00003-of-00007.safetensors
./model-00004-of-00007.safetensors
./model-00005-of-00007.safetensors
./model-00006-of-00007.safetensors
./model-00007-of-00007.safetensors
./modeling_mpt.py
./norm.py
./param_init_fns.py
./README.md
./special_tokens_map.json
./tokenizer_config.json
./tokenizer.json
models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/model.tar.gz
$ ls -alh model.tar.gz
-rw-rw-r-- 1 ec2-user ec2-user 45G Jul 6 19:53 model.tar.gz
docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1
Just want to say thanks so much for this! It helped me host my own TGI 0.9.4 image so I'm not stuck on 0.8.2.
@yapweiyih I am guessing, what files do you have in MERGE_MODEL_DIR? are they all local files or any of them is a symlink to a file in the cache?
I had problems with symlinks and had to add
--dereference
to follow symlinks (GNU Tar default is to copy the link).Also, let me share the way I did to convert the model:
mkdir -p data docker run --rm -it \ --volume $PWD/data:/data \ --entrypoint "" \ --env HF_HUB_ENABLE_HF_TRANSFER=0 \ ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1 \ text-generation-server download-weights tiiuae/falcon-40b-instruct conda create -y -q --force \ -c conda-forge \ -n generative-ai-tgi \ python=3.11 \ huggingface_hub \ sagemaker-python-sdk \ tar \ pigz conda run -n generative-ai-tgi --live-stream \ python create_sagemaker_model.py tiiuae/falcon-40b-instruct
create_sagemaker_model.py
(
s3_model_store_uri
must be set to your S3 URI)import os import shutil import subprocess import sys from datetime import datetime from huggingface_hub import snapshot_download from sagemaker.s3 import S3Downloader, S3Uploader # Arguments if len(sys.argv) < 2: print( "usage: create_sagemaker_model.py <repo_id> [TGI_data_dir] (default [TGI_data_dir] is ./data)" ) sys.exit(-1) # repo_id = "tiiuae/falcon-40b-instruct" # repo_id = "mosaicml/mpt-30b-instruct" repo_id = sys.argv[1] tgi_data_dir = "data" if len(sys.argv) <= 2 else sys.argv[2] s3_model_store_uri = "s3://<BUCKET>/<PREFIX>" overwrite = False # TGI Download Weights -> Converting PyTorch weights to safetensors. model_name = repo_id.replace("/", "--") tgi_model_dir = os.path.join(tgi_data_dir, f"models--{model_name}") if not os.path.isdir(tgi_model_dir): raise Exception(f"TGI Model dir not found: {tgi_model_dir}") tgi_model_revision = os.path.join(tgi_model_dir, "refs", "main") if not os.path.isfile(tgi_model_revision): raise Exception(f"TGI Model revision not found: {tgi_model_revision}") with open(tgi_model_revision, "r", encoding="ascii") as revision_file: model_revision = revision_file.read().strip() tgi_model_data_dir = os.path.join(tgi_model_dir, "snapshots", model_revision) if not os.path.isdir(tgi_model_data_dir): raise Exception(f"TGI Model data dir not found: {tgi_model_data_dir}") tgi_model_safetensors = [ os.path.join(tgi_model_data_dir, filename) for filename in os.listdir(tgi_model_data_dir) if filename.endswith(".safetensors") ] if not tgi_model_safetensors: raise Exception(f"TGI Model safe tensors not found: {tgi_model_data_dir}") # Download model files (skip weights) model_data_dir = os.path.join("models", model_name, model_revision, "data") print() print(f"Downloading model data to {model_data_dir}") print() start_time = datetime.now() if overwrite and os.path.isdir(model_data_dir): print("(model data found. Overwriting...)") shutil.rmtree(model_data_dir) if not os.path.isdir(model_data_dir): os.makedirs(model_data_dir) snapshot_download( repo_id, revision=model_revision, local_dir=model_data_dir, local_dir_use_symlinks=False, resume_download=True, ignore_patterns=["*.msgpack*", "*.h5*", "*.bin*"], ) for path in sorted(tgi_model_safetensors): src_name = os.path.basename(path) src_path = os.path.abspath(path) dst_path = os.path.join(model_data_dir, src_name) os.symlink(src=src_path, dst=dst_path) end_time = datetime.now() elapsed_time = end_time - start_time print() print(f"Elapsed : {elapsed_time}") print() # Create model package (include safetensors weights) model_tgz = os.path.join("models", model_name, model_revision, "model.tar.gz") print() print(f"Creating model package {model_tgz}") print() start_time = datetime.now() if overwrite and os.path.isfile(model_tgz): print("(model package found. Overwriting...)") os.remove(model_tgz) if not os.path.isfile(model_tgz): packaging_cmd = [ "tar", "-cf", model_tgz, "--dereference", # follow symlinks "--use-compress-program=pigz", "-C", model_data_dir, ".", ] subprocess.run(packaging_cmd, env={"GZIP": "-1"}, check=True) print(f"tar.gz size: {os.path.getsize(model_tgz):,d} bytes") end_time = datetime.now() elapsed_time = end_time - start_time print() print(f"Elapsed : {elapsed_time}") print() # Upload model.tar.gz to S3 model_prefix_uri = f"{s3_model_store_uri}/{model_name}/{model_revision}" model_tgz_uri = f"{model_prefix_uri}/model.tar.gz" print() print(f"Uploading model package to {model_tgz_uri}") print() start_time = datetime.now() s3_model_files = S3Downloader.list(s3_uri=model_prefix_uri) if overwrite or model_tgz_uri not in s3_model_files: print("(uploading...)") s3_model_uri = S3Uploader.upload(local_path=model_tgz, desired_s3_uri=model_prefix_uri) if s3_model_uri != model_tgz_uri: print("WAT?") print(s3_model_uri) end_time = datetime.now() elapsed_time = end_time - start_time print() print(f"Elapsed : {elapsed_time}") print() print("Done!")
The expected output for Falcon 40B Instruct.
models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/data/
$ find . | sort . ./config.json ./configuration_RW.py ./generation_config.json ./.gitattributes ./handler.py ./model-00001-of-00009.safetensors ./model-00002-of-00009.safetensors ./model-00003-of-00009.safetensors ./model-00004-of-00009.safetensors ./model-00005-of-00009.safetensors ./model-00006-of-00009.safetensors ./model-00007-of-00009.safetensors ./model-00008-of-00009.safetensors ./model-00009-of-00009.safetensors ./modelling_RW.py ./README.md ./special_tokens_map.json ./tokenizer_config.json ./tokenizer.json
models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/model.tar.gz
$ ls -alh model.tar.gz -rw-rw-r-- 1 ec2-user ec2-user 63G Jul 6 20:19 model.tar.gz
The expected output for MPT 30B Instruct.
models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/data/
$ find . | sort . ./adapt_tokenizer.py ./attention.py ./blocks.py ./config.json ./configuration_mpt.py ./custom_embedding.py ./flash_attn_triton.py ./generation_config.json ./.gitattributes ./hf_prefixlm_converter.py ./meta_init_context.py ./model-00001-of-00007.safetensors ./model-00002-of-00007.safetensors ./model-00003-of-00007.safetensors ./model-00004-of-00007.safetensors ./model-00005-of-00007.safetensors ./model-00006-of-00007.safetensors ./model-00007-of-00007.safetensors ./modeling_mpt.py ./norm.py ./param_init_fns.py ./README.md ./special_tokens_map.json ./tokenizer_config.json ./tokenizer.json
models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/model.tar.gz
$ ls -alh model.tar.gz -rw-rw-r-- 1 ec2-user ec2-user 45G Jul 6 19:53 model.tar.gz
@cirocavani This model tiiuae/falcon-rw-1b
is not supported, thanks for suggesting using TGI container which i can test different models quickly.
I will close this issue since it seems to be solved.
For tiiuae/falcon-rw-1b
feel free to open an issue with the env and stacktrace so we can look into fixing it !
I will close this issue since it seems to be solved.
Hi, Sorry, so what is the solution? Both config params "TRUST_REMOTE_CODE" and "HF_MODEL_TRUST_REMOTE_CODE" doesn't work. Below is my config:
model_name = "tiiuae/falcon-7b-instruct"
trust_remote_code = True
# Hub model configuration <https://huggingface.co/models>
hub_config = {
'HF_MODEL_ID': model_name, # model_id from hf.co/models
'HF_TASK':'question-answering', # NLP task you want to use for predictions
'HF_MODEL_TRUST_REMOTE_CODE': json.dumps(trust_remote_code),
'HF_MODEL_QUANTIZE': "bitsandbytes", # comment in to quantize
'TRUST_REMOTE_CODE': json.dumps(trust_remote_code),
}
Is the only option TGI? It is very strange that for adding such simple true/false I will need to add so much code...
hi! I am having an issue with this as well. pytorch-huggingface-inference container doesn't seem to accept HF_TRUST_REMOTE_CODE. Further, in the TGI container the hub_env config seems to ignore the specified HF_TASK and only does text-generation. setting HF_TASK to 'feature-extraction or 'text-classification' doesn't seem to have any effect.
System Info
versions: python 3.10 sagemaker 2.168.0 (latest) huggingface tgi 0.8.2 (latest)
Reproduction
I'm trying to deploy MPT-30B-instruct and WizardLM-Uncensored-Falcon-40b in SageMaker
my config is
when I look in the logs though, the Args show
Args { (other stuff...), trust_remote_code: false }
Expected behavior
Args { (other stuff...), trust_remote_code: true }