Sagemaker entrypoint HF_MODEL_TRUST_REMOTE_CODE in config doesn't get recognized

System Info

versions: python 3.10 sagemaker 2.168.0 (latest) huggingface tgi 0.8.2 (latest)

Reproduction

I'm trying to deploy MPT-30B-instruct and WizardLM-Uncensored-Falcon-40b in SageMaker

my config is

config = {  
    (other stuff...)
    'HF_MODEL_TRUST_REMOTE_CODE': json.dumps(True),
}

when I look in the logs though, the Args show Args { (other stuff...), trust_remote_code: false }

Expected behavior

Args { (other stuff...), trust_remote_code: true }
Models deploy successfully

https://github.com/huggingface/text-generation-inference/pull/514 should make requiring TRUST_REMOTE_CODE not necessary anymore.

i have issue with tiiuae/falcon-rw-1b on sagemaker==2.170.0, and get the following error. Did i miss something?

ValueError: Loading /opt/ml/model requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

Code to reproduce.

import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

from huggingface_hub import snapshot_download
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sess.boto_region_name}")

MODEL_ID = "tiiuae/falcon-rw-1b"
CACHED_DIR = "../cache"
MERGE_MODEL_DIR = "merged_model_test"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    # device_map="auto",
    trust_remote_code=True,
    cache_dir=CACHED_DIR,
)

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_ID,
    cache_dir=CACHED_DIR,
)

model.save_pretrained(MERGE_MODEL_DIR, safe_serialization=True)
tokenizer.save_pretrained(MERGE_MODEL_DIR, safe_serialization=True)
import os

parent_dir = os.getcwd()
# change to model dir
os.chdir(MERGE_MODEL_DIR)
# use pigz for faster and parallel compression
!tar -cf model.tar.gz --use-compress-program=pigz *
# change back to parent dir
os.chdir(parent_dir)

from sagemaker.s3 import S3Uploader

# upload model.tar.gz to s3
s3_model_uri = S3Uploader.upload(local_path=str(MERGE_MODEL_DIR + "/model.tar.gz"), desired_s3_uri=f"s3://{sess.default_bucket()}/test-model")

print(f"model uploaded to: {s3_model_uri}")

from sagemaker.huggingface import get_huggingface_llm_image_uri, HuggingFaceModel
import json

image_uri = get_huggingface_llm_image_uri("huggingface", version="0.8.2")
print(f"llm image uri: {image_uri}")

instance_type = "ml.g4dn.2xlarge"
health_check_timeout = 300
trust_remote_code = True

config = {
    "HF_MODEL_ID": "/opt/ml/model",  # path to where sagemaker stores the model
    "MAX_INPUT_LENGTH": json.dumps(2048),  # Max length of input text
    "MAX_TOTAL_TOKENS": json.dumps(3000),  # Max length of the generation (including input text)
    "HF_MODEL_TRUST_REMOTE_CODE": json.dumps(trust_remote_code)
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
    role=role,
    image_uri=image_uri,
    model_data=s3_model_uri,
    env=config,
)

endpoint_name = sagemaker.utils.name_from_base("test")

predictor = llm_model.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1,
    instance_type=instance_type,
    model_data_download_timeout=10 * 60,
    container_startup_health_check_timeout=10 * 60,
    wait=False,
)

print(predictor.endpoint_name)

hi, I had the same problem. Version 0.8.2 does not have HF_MODEL_TRUST_REMOTE_CODE in entrypoint script.

https://github.com/huggingface/text-generation-inference/blob/v0.8.2/sagemaker-entrypoint.sh

A workaround is to set TRUST_REMOTE_CODE=true directly (text-generation-laucher will pick this environment variable).

https://github.com/huggingface/text-generation-inference/blob/v0.8.2/launcher/src/main.rs#L74-L78

config = {  
    (other stuff...)
    'TRUST_REMOTE_CODE': json.dumps(True),
}

The other option is to use the latest TGI and remove TRUST_REMOTE_CODE.

SageMaker SDK 2.171 still has version 0.8.2.

https://github.com/aws/sagemaker-python-sdk/blob/v2.171.0/src/sagemaker/image_uri_config/huggingface-llm.json

The latest release in this repo is 0.9.1.

https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference/107426365?tag=0.9.1

In my case, I pushed image 0.9.1 to my account ECR (I didn't try to pull from github directly to create my endpoint).

Something like:

docker pull --platform=linux/amd64 ghcr.io/huggingface/text-generation-inference:0.9.1

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com

docker build \
--build-arg TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:0.9.1 \
-t ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1 \
-f Dockerfile \
.

docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1

Docker build is required to change ENTRYPOINT and CMD from the published image.

sagemaker-entrypoint.sh

https://github.com/huggingface/text-generation-inference/blob/v0.9.1/Dockerfile#L179-L185

ARG TGI_IMAGE
FROM --platform=linux/amd64 ${TGI_IMAGE}

COPY sagemaker-entrypoint.sh entrypoint.sh
RUN chmod +x entrypoint.sh

ENTRYPOINT ["./entrypoint.sh"]
CMD [ "" ]

then, the deploy script changed to:

llm_model = HuggingFaceModel(
    role=role,
    image_uri="ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1",
    # ...
)

Tip: 0.9.1 change HUGGINGFACE_HUB_CACHE to "/data" and something goes wrong... so I changed back to "/tmp" (as in 0.8.2).

With this setup, "TRUST_REMOTE_CODE" is not required to run Falcon or MPT as @Narsil said.

I tested with Falcon 40B Instruct (2 configs DTYPE=b-float16 and HF_MODEL_QUANTIZE=bitsandbytes), MPT 30B Instruct (2 configs DTYPE=b-float16 and HF_MODEL_QUANTIZE=bitsandbytes). Each config in a single ml.g5.12xlarge. I had to add MAX_BATCH_PREFILL_TOKENS and MAX_BATCH_TOTAL_TOKENS to avoid memory issues.

{
    "HF_MODEL_ID": "tiiuae/falcon-40b-instruct",
    "DTYPE": "b-float16",
    "MAX_INPUT_LENGTH": "1024",
    "MAX_TOTAL_TOKENS": "2048",
    "MAX_BATCH_PREFILL_TOKENS": "2048",
    "MAX_BATCH_TOTAL_TOKENS": "8000",
    "HUGGINGFACE_HUB_CACHE": "/tmp",
    "SM_NUM_GPUS": "4"
}
{
    "HF_MODEL_ID": "tiiuae/falcon-40b-instruct",
    "HF_MODEL_QUANTIZE": "bitsandbytes",
    "MAX_INPUT_LENGTH": "1024",
    "MAX_TOTAL_TOKENS": "2048",
    "MAX_BATCH_PREFILL_TOKENS": "2048",
    "MAX_BATCH_TOTAL_TOKENS": "8000",
    "HUGGINGFACE_HUB_CACHE": "/tmp",
    "SM_NUM_GPUS": "4"
}
{
    "HF_MODEL_ID": "mosaicml/mpt-30b-instruct",
    "DTYPE": "b-float16",
    "MAX_INPUT_LENGTH": "1024",
    "MAX_TOTAL_TOKENS": "2048",
    "MAX_BATCH_PREFILL_TOKENS": "2048",
    "MAX_BATCH_TOTAL_TOKENS": "8000",
    "HUGGINGFACE_HUB_CACHE": "/tmp",
    "SM_NUM_GPUS": "4"
}
{
    "HF_MODEL_ID": "mosaicml/mpt-30b-instruct",
    "HF_MODEL_QUANTIZE": "bitsandbytes",
    "MAX_INPUT_LENGTH": "1024",
    "MAX_TOTAL_TOKENS": "2048",
    "MAX_BATCH_PREFILL_TOKENS": "2048",
    "MAX_BATCH_TOTAL_TOKENS": "8000",
    "HUGGINGFACE_HUB_CACHE": "/tmp",
    "SM_NUM_GPUS": "4"
}

I also tested with model.tar.gz for both Falcon 63GB and MPT 45GB. I had to add model_data_download_timeout to avoid SageMaker failing to create endpoint with Falcon.

llm_model.deploy(
    # ...
    container_startup_health_check_timeout=3600,
    model_data_download_timeout=3600,
)

I am still evaluating this setup, so take it as is, I don't know if it is enough or even good (One thing I can tell: it's better than Falcon from SageMaker JumpStart at glance, too slow).

@yapweiyih I am guessing, what files do you have in MERGE_MODEL_DIR? are they all local files or any of them is a symlink to a file in the cache?

I had problems with symlinks and had to add --dereference to follow symlinks (GNU Tar default is to copy the link).

Also, let me share the way I did to convert the model:

mkdir -p data

docker run --rm -it \
--volume $PWD/data:/data \
--entrypoint "" \
--env HF_HUB_ENABLE_HF_TRANSFER=0 \
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1 \
text-generation-server download-weights tiiuae/falcon-40b-instruct

conda create -y -q --force \
-c conda-forge \
-n generative-ai-tgi \
python=3.11 \
huggingface_hub \
sagemaker-python-sdk \
tar \
pigz

conda run -n generative-ai-tgi --live-stream \
python create_sagemaker_model.py tiiuae/falcon-40b-instruct

create_sagemaker_model.py

(s3_model_store_uri must be set to your S3 URI)

import os
import shutil
import subprocess
import sys
from datetime import datetime

from huggingface_hub import snapshot_download
from sagemaker.s3 import S3Downloader, S3Uploader

# Arguments

if len(sys.argv) < 2:
    print(
        "usage: create_sagemaker_model.py <repo_id> [TGI_data_dir]   (default [TGI_data_dir] is ./data)"
    )
    sys.exit(-1)

# repo_id = "tiiuae/falcon-40b-instruct"
# repo_id = "mosaicml/mpt-30b-instruct"
repo_id = sys.argv[1]

tgi_data_dir = "data" if len(sys.argv) <= 2 else sys.argv[2]

s3_model_store_uri = "s3://<BUCKET>/<PREFIX>"

overwrite = False

# TGI Download Weights -> Converting PyTorch weights to safetensors.

model_name = repo_id.replace("/", "--")

tgi_model_dir = os.path.join(tgi_data_dir, f"models--{model_name}")

if not os.path.isdir(tgi_model_dir):
    raise Exception(f"TGI Model dir not found: {tgi_model_dir}")

tgi_model_revision = os.path.join(tgi_model_dir, "refs", "main")

if not os.path.isfile(tgi_model_revision):
    raise Exception(f"TGI Model revision not found: {tgi_model_revision}")

with open(tgi_model_revision, "r", encoding="ascii") as revision_file:
    model_revision = revision_file.read().strip()

tgi_model_data_dir = os.path.join(tgi_model_dir, "snapshots", model_revision)

if not os.path.isdir(tgi_model_data_dir):
    raise Exception(f"TGI Model data dir not found: {tgi_model_data_dir}")

tgi_model_safetensors = [
    os.path.join(tgi_model_data_dir, filename)
    for filename in os.listdir(tgi_model_data_dir)
    if filename.endswith(".safetensors")
]

if not tgi_model_safetensors:
    raise Exception(f"TGI Model safe tensors not found: {tgi_model_data_dir}")

# Download model files (skip weights)

model_data_dir = os.path.join("models", model_name, model_revision, "data")

print()
print(f"Downloading model data to {model_data_dir}")
print()

start_time = datetime.now()

if overwrite and os.path.isdir(model_data_dir):
    print("(model data found. Overwriting...)")
    shutil.rmtree(model_data_dir)

if not os.path.isdir(model_data_dir):
    os.makedirs(model_data_dir)

    snapshot_download(
        repo_id,
        revision=model_revision,
        local_dir=model_data_dir,
        local_dir_use_symlinks=False,
        resume_download=True,
        ignore_patterns=["*.msgpack*", "*.h5*", "*.bin*"],
    )

    for path in sorted(tgi_model_safetensors):
        src_name = os.path.basename(path)
        src_path = os.path.abspath(path)
        dst_path = os.path.join(model_data_dir, src_name)
        os.symlink(src=src_path, dst=dst_path)

end_time = datetime.now()
elapsed_time = end_time - start_time

print()
print(f"Elapsed : {elapsed_time}")
print()

# Create model package (include safetensors weights)

model_tgz = os.path.join("models", model_name, model_revision, "model.tar.gz")

print()
print(f"Creating model package {model_tgz}")
print()

start_time = datetime.now()

if overwrite and os.path.isfile(model_tgz):
    print("(model package found. Overwriting...)")
    os.remove(model_tgz)

if not os.path.isfile(model_tgz):
    packaging_cmd = [
        "tar",
        "-cf",
        model_tgz,
        "--dereference",  # follow symlinks
        "--use-compress-program=pigz",
        "-C",
        model_data_dir,
        ".",
    ]
    subprocess.run(packaging_cmd, env={"GZIP": "-1"}, check=True)

print(f"tar.gz size: {os.path.getsize(model_tgz):,d} bytes")

end_time = datetime.now()
elapsed_time = end_time - start_time

print()
print(f"Elapsed : {elapsed_time}")
print()

# Upload model.tar.gz to S3

model_prefix_uri = f"{s3_model_store_uri}/{model_name}/{model_revision}"
model_tgz_uri = f"{model_prefix_uri}/model.tar.gz"

print()
print(f"Uploading model package to {model_tgz_uri}")
print()

start_time = datetime.now()

s3_model_files = S3Downloader.list(s3_uri=model_prefix_uri)

if overwrite or model_tgz_uri not in s3_model_files:
    print("(uploading...)")
    s3_model_uri = S3Uploader.upload(local_path=model_tgz, desired_s3_uri=model_prefix_uri)
    if s3_model_uri != model_tgz_uri:
        print("WAT?")
        print(s3_model_uri)

end_time = datetime.now()
elapsed_time = end_time - start_time

print()
print(f"Elapsed : {elapsed_time}")
print()

print("Done!")

The expected output for Falcon 40B Instruct.

models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/data/

$ find . | sort
.
./config.json
./configuration_RW.py
./generation_config.json
./.gitattributes
./handler.py
./model-00001-of-00009.safetensors
./model-00002-of-00009.safetensors
./model-00003-of-00009.safetensors
./model-00004-of-00009.safetensors
./model-00005-of-00009.safetensors
./model-00006-of-00009.safetensors
./model-00007-of-00009.safetensors
./model-00008-of-00009.safetensors
./model-00009-of-00009.safetensors
./modelling_RW.py
./README.md
./special_tokens_map.json
./tokenizer_config.json
./tokenizer.json

models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/model.tar.gz

$ ls -alh model.tar.gz 
-rw-rw-r-- 1 ec2-user ec2-user 63G Jul  6 20:19 model.tar.gz

The expected output for MPT 30B Instruct.

models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/data/

$ find . | sort
.
./adapt_tokenizer.py
./attention.py
./blocks.py
./config.json
./configuration_mpt.py
./custom_embedding.py
./flash_attn_triton.py
./generation_config.json
./.gitattributes
./hf_prefixlm_converter.py
./meta_init_context.py
./model-00001-of-00007.safetensors
./model-00002-of-00007.safetensors
./model-00003-of-00007.safetensors
./model-00004-of-00007.safetensors
./model-00005-of-00007.safetensors
./model-00006-of-00007.safetensors
./model-00007-of-00007.safetensors
./modeling_mpt.py
./norm.py
./param_init_fns.py
./README.md
./special_tokens_map.json
./tokenizer_config.json
./tokenizer.json

models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/model.tar.gz

$ ls -alh model.tar.gz 
-rw-rw-r-- 1 ec2-user ec2-user 45G Jul  6 19:53 model.tar.gz

docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1

Just want to say thanks so much for this! It helped me host my own TGI 0.9.4 image so I'm not stuck on 0.8.2.

@yapweiyih I am guessing, what files do you have in MERGE_MODEL_DIR? are they all local files or any of them is a symlink to a file in the cache?

I had problems with symlinks and had to add --dereference to follow symlinks (GNU Tar default is to copy the link).

Also, let me share the way I did to convert the model:

mkdir -p data

docker run --rm -it \
--volume $PWD/data:/data \
--entrypoint "" \
--env HF_HUB_ENABLE_HF_TRANSFER=0 \
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/generative-ai-llm-tgi:0.9.1 \
text-generation-server download-weights tiiuae/falcon-40b-instruct

conda create -y -q --force \
-c conda-forge \
-n generative-ai-tgi \
python=3.11 \
huggingface_hub \
sagemaker-python-sdk \
tar \
pigz

conda run -n generative-ai-tgi --live-stream \
python create_sagemaker_model.py tiiuae/falcon-40b-instruct

create_sagemaker_model.py

(s3_model_store_uri must be set to your S3 URI)

import os
import shutil
import subprocess
import sys
from datetime import datetime

from huggingface_hub import snapshot_download
from sagemaker.s3 import S3Downloader, S3Uploader

# Arguments

if len(sys.argv) < 2:
    print(
        "usage: create_sagemaker_model.py <repo_id> [TGI_data_dir]   (default [TGI_data_dir] is ./data)"
    )
    sys.exit(-1)

# repo_id = "tiiuae/falcon-40b-instruct"
# repo_id = "mosaicml/mpt-30b-instruct"
repo_id = sys.argv[1]

tgi_data_dir = "data" if len(sys.argv) <= 2 else sys.argv[2]

s3_model_store_uri = "s3://<BUCKET>/<PREFIX>"

overwrite = False

# TGI Download Weights -> Converting PyTorch weights to safetensors.

model_name = repo_id.replace("/", "--")

tgi_model_dir = os.path.join(tgi_data_dir, f"models--{model_name}")

if not os.path.isdir(tgi_model_dir):
    raise Exception(f"TGI Model dir not found: {tgi_model_dir}")

tgi_model_revision = os.path.join(tgi_model_dir, "refs", "main")

if not os.path.isfile(tgi_model_revision):
    raise Exception(f"TGI Model revision not found: {tgi_model_revision}")

with open(tgi_model_revision, "r", encoding="ascii") as revision_file:
    model_revision = revision_file.read().strip()

tgi_model_data_dir = os.path.join(tgi_model_dir, "snapshots", model_revision)

if not os.path.isdir(tgi_model_data_dir):
    raise Exception(f"TGI Model data dir not found: {tgi_model_data_dir}")

tgi_model_safetensors = [
    os.path.join(tgi_model_data_dir, filename)
    for filename in os.listdir(tgi_model_data_dir)
    if filename.endswith(".safetensors")
]

if not tgi_model_safetensors:
    raise Exception(f"TGI Model safe tensors not found: {tgi_model_data_dir}")

# Download model files (skip weights)

model_data_dir = os.path.join("models", model_name, model_revision, "data")

print()
print(f"Downloading model data to {model_data_dir}")
print()

start_time = datetime.now()

if overwrite and os.path.isdir(model_data_dir):
    print("(model data found. Overwriting...)")
    shutil.rmtree(model_data_dir)

if not os.path.isdir(model_data_dir):
    os.makedirs(model_data_dir)

    snapshot_download(
        repo_id,
        revision=model_revision,
        local_dir=model_data_dir,
        local_dir_use_symlinks=False,
        resume_download=True,
        ignore_patterns=["*.msgpack*", "*.h5*", "*.bin*"],
    )

    for path in sorted(tgi_model_safetensors):
        src_name = os.path.basename(path)
        src_path = os.path.abspath(path)
        dst_path = os.path.join(model_data_dir, src_name)
        os.symlink(src=src_path, dst=dst_path)

end_time = datetime.now()
elapsed_time = end_time - start_time

print()
print(f"Elapsed : {elapsed_time}")
print()

# Create model package (include safetensors weights)

model_tgz = os.path.join("models", model_name, model_revision, "model.tar.gz")

print()
print(f"Creating model package {model_tgz}")
print()

start_time = datetime.now()

if overwrite and os.path.isfile(model_tgz):
    print("(model package found. Overwriting...)")
    os.remove(model_tgz)

if not os.path.isfile(model_tgz):
    packaging_cmd = [
        "tar",
        "-cf",
        model_tgz,
        "--dereference",  # follow symlinks
        "--use-compress-program=pigz",
        "-C",
        model_data_dir,
        ".",
    ]
    subprocess.run(packaging_cmd, env={"GZIP": "-1"}, check=True)

print(f"tar.gz size: {os.path.getsize(model_tgz):,d} bytes")

end_time = datetime.now()
elapsed_time = end_time - start_time

print()
print(f"Elapsed : {elapsed_time}")
print()

# Upload model.tar.gz to S3

model_prefix_uri = f"{s3_model_store_uri}/{model_name}/{model_revision}"
model_tgz_uri = f"{model_prefix_uri}/model.tar.gz"

print()
print(f"Uploading model package to {model_tgz_uri}")
print()

start_time = datetime.now()

s3_model_files = S3Downloader.list(s3_uri=model_prefix_uri)

if overwrite or model_tgz_uri not in s3_model_files:
    print("(uploading...)")
    s3_model_uri = S3Uploader.upload(local_path=model_tgz, desired_s3_uri=model_prefix_uri)
    if s3_model_uri != model_tgz_uri:
        print("WAT?")
        print(s3_model_uri)

end_time = datetime.now()
elapsed_time = end_time - start_time

print()
print(f"Elapsed : {elapsed_time}")
print()

print("Done!")

The expected output for Falcon 40B Instruct.

models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/data/

$ find . | sort
.
./config.json
./configuration_RW.py
./generation_config.json
./.gitattributes
./handler.py
./model-00001-of-00009.safetensors
./model-00002-of-00009.safetensors
./model-00003-of-00009.safetensors
./model-00004-of-00009.safetensors
./model-00005-of-00009.safetensors
./model-00006-of-00009.safetensors
./model-00007-of-00009.safetensors
./model-00008-of-00009.safetensors
./model-00009-of-00009.safetensors
./modelling_RW.py
./README.md
./special_tokens_map.json
./tokenizer_config.json
./tokenizer.json

models/tiiuae--falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/model.tar.gz

$ ls -alh model.tar.gz 
-rw-rw-r-- 1 ec2-user ec2-user 63G Jul  6 20:19 model.tar.gz

The expected output for MPT 30B Instruct.

models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/data/

$ find . | sort
.
./adapt_tokenizer.py
./attention.py
./blocks.py
./config.json
./configuration_mpt.py
./custom_embedding.py
./flash_attn_triton.py
./generation_config.json
./.gitattributes
./hf_prefixlm_converter.py
./meta_init_context.py
./model-00001-of-00007.safetensors
./model-00002-of-00007.safetensors
./model-00003-of-00007.safetensors
./model-00004-of-00007.safetensors
./model-00005-of-00007.safetensors
./model-00006-of-00007.safetensors
./model-00007-of-00007.safetensors
./modeling_mpt.py
./norm.py
./param_init_fns.py
./README.md
./special_tokens_map.json
./tokenizer_config.json
./tokenizer.json

models/mosaicml--mpt-30b-instruct/2abf1163dd8c9b11f07d805c06e6ec90a1f2037e/model.tar.gz

$ ls -alh model.tar.gz 
-rw-rw-r-- 1 ec2-user ec2-user 45G Jul  6 19:53 model.tar.gz

@cirocavani This model tiiuae/falcon-rw-1b is not supported, thanks for suggesting using TGI container which i can test different models quickly.

I will close this issue since it seems to be solved.

For tiiuae/falcon-rw-1b feel free to open an issue with the env and stacktrace so we can look into fixing it !

I will close this issue since it seems to be solved.

Hi, Sorry, so what is the solution? Both config params "TRUST_REMOTE_CODE" and "HF_MODEL_TRUST_REMOTE_CODE" doesn't work. Below is my config:

model_name = "tiiuae/falcon-7b-instruct"
trust_remote_code = True

# Hub model configuration <https://huggingface.co/models>
hub_config = {
  'HF_MODEL_ID': model_name, # model_id from hf.co/models
  'HF_TASK':'question-answering',                           # NLP task you want to use for predictions
  'HF_MODEL_TRUST_REMOTE_CODE': json.dumps(trust_remote_code),
  'HF_MODEL_QUANTIZE': "bitsandbytes", # comment in to quantize
  'TRUST_REMOTE_CODE': json.dumps(trust_remote_code),

}

Is the only option TGI? It is very strange that for adding such simple true/false I will need to add so much code...

hi! I am having an issue with this as well. pytorch-huggingface-inference container doesn't seem to accept HF_TRUST_REMOTE_CODE. Further, in the TGI container the hub_env config seems to ignore the specified HF_TASK and only does text-generation. setting HF_TASK to 'feature-extraction or 'text-classification' doesn't seem to have any effect.

huggingface / text-generation-inference