Closed frieda-huang closed 1 month ago
The issue is that the inference provider needs to support inferencing the llama guard model :( Right now the ollama adapter does not have explicit support for that. This should be easy to add assuming ollama supports some version of llama guard.
The issue is that the inference provider needs to support inferencing the llama guard model :( Right now the ollama adapter does not have explicit support for that. This should be easy to add assuming ollama supports some version of llama guard.
It seems like the routing is not working despite adding the ollama adapter. It doesn't recognize Llama-Guard-3-1B
because it's not part of the available routing keys. Only available routing key is Llama3.1-8B-Instruct
.
I added the following ollama.py under the adapters.
SAFETY_SHIELD_TYPES = {
"llama_guard": "xe/llamaguard3:latest",
"Llama-Guard-3-1B": "xe/llamaguard3:1b-latest",
}
class OllamaSafetyImpl(Safety, RoutableProvider):
def __init__(self, config: OllamaSafetyConfig):
self.config = config
async def validate_routing_keys(self, routing_keys: List[str]) -> None:
for key in routing_keys:
if key not in SAFETY_SHIELD_TYPES:
raise ValueError(f"Unknown safety shield type: {key}")
async def initialize(self):
pass
async def shutdown(self) -> None:
pass
async def get_safety_response(
self, model_name: str, messages: List[Dict[str, str]]
) -> Optional[SafetyViolation]:
response = the_ollama.chat(model=model_name, messages=messages)
raw_text = response["message"]["content"]
parts = raw_text.strip().split("\n")
if parts[0] == "safe":
return None
if parts[0] == "unsafe":
return SafetyViolation(
violation_level=ViolationLevel.ERROR,
user_message="unsafe",
metadata={"violation_type": parts[1]},
)
return RunShieldResponse()
async def run_shield(
self, shield_type: str, messages: List[Message], params: Dict[str, Any] = None
) -> RunShieldResponse:
content_messages = []
for message in messages:
content_messages.append({"role": "user", "content": message["content"]})
model_name = SAFETY_SHIELD_TYPES[shield_type]
violation = await self.get_safety_response(model_name, content_messages)
return RunShieldResponse(violation=violation)
I think instead of adding an ollama safety adapter, you can just add these two models into the ollama models map here: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/ollama/ollama.py#L29
Then, it should all work IMO.
"Llama-Guard-3-1B": "xe/llamaguard3:1b-latest",
I don't think this model (llamaguard3:1b) does not exist on ollama
?
"Llama-Guard-3-1B": "xe/llamaguard3:1b-latest",
I don't think this model (llamaguard3:1b) does not exist on
ollama
?
My bad. It was GPT hallucinating and me not reading it carefully.
I have almost similar problem
INFO: Started server process [9068]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
INFO: 127.0.0.1:39058 - "POST /agents/create HTTP/1.1" 200 OK
INFO: 127.0.0.1:39058 - "POST /agents/session/create HTTP/1.1" 200 OK
INFO: 127.0.0.1:39068 - "POST /agents/create HTTP/1.1" 200 OK
INFO: 127.0.0.1:39068 - "POST /agents/session/create HTTP/1.1" 200 OK
INFO: 127.0.0.1:55950 - "POST /agents/turn/create HTTP/1.1" 200 OK
Resolving model: model_name=Prompt-Guard-86M
11:34:18.341 [INFO] [create_agent_turn] Attempting to find provider for model: Llama-Guard-3-8B
Resolved model directory: model_dir=/home/gpu-machine/.llama/checkpoints/Prompt-Guard-86M
Creating PromptGuardShield instance: model_dir=/home/gpu-machine/.llama/checkpoints/Prompt-Guard-86M, key=('/home/gpu-machine/.llama/checkpoints/Prompt-Guard-86M', 0.9, 1.0, <Mode.JAILBREAK: 2>, 2)
11:34:18.341 [ERROR] [create_agent_turn] Error in chat_completion: Could not find provider for Llama-Guard-3-8B
Ran PromptGuardShield and got Scores: Embedded: 0.05818163976073265, Malicious: 0.0019444914069026709
Traceback (most recent call last):
File "/home/gpu-machine/local/llama-stack/llama_stack/distribution/server/server.py", line 229, in sse_generator
async for item in event_gen:
File "/home/gpu-machine/local/llama-stack/llama_stack/providers/impls/meta_reference/agents/agents.py", line 127, in create_agent_turn
async for event in agent.create_and_execute_turn(request):
File "~/local/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 174, in create_and_execute_turn
async for chunk in self.run(
File "~/local/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 239, in run
async for res in self.run_multiple_shields_wrapper(
File "~/local/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 294, in run_multiple_shields_wrapper
await self.run_multiple_shields(messages, shields)
File "~/local/llama-stack/llama_stack/providers/impls/meta_reference/agents/safety.py", line 37, in run_multiple_shields
responses = await asyncio.gather(
File "~/local/llama-stack/llama_stack/distribution/routers/routers.py", line 232, in run_shield
return await self.routing_table.get_provider_impl(shield_type).run_shield(
File "~/local/llama-stack/llama_stack/providers/impls/meta_reference/safety/safety.py", line 88, in run_shield
res = await shield.run(messages)
File "~/local/llama-stack/llama_stack/providers/impls/meta_reference/safety/shields/llama_guard.py", line 197, in run
async for chunk in self.inference_api.chat_completion(
File "~/local/llama-stack/llama_stack/distribution/routers/routers.py", line 169, in chat_completion
provider = self.routing_table.get_provider_impl(model)
File "~/local/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 38, in get_provider_impl
raise ValueError(f"Could not find provider for {routing_key}")
ValueError: Could not find provider for Llama-Guard-3-8B
ollama run llama3.1:8b-instruct-fp16
built_at: '2024-09-30T18:43:36.706693'
image_name: 8b-instruct
docker_image: null
conda_env: 8b-instruct
apis_to_serve:
- memory
- inference
- safety
- shields
- models
- memory_banks
- agents
api_providers:
inference:
providers:
- remote::ollama
safety:
providers:
- meta-reference
agents:
provider_id: meta-reference
config:
persistence_store:
namespace: null
type: sqlite
db_path: ~/.llama/runtime/kvstore.db
memory:
providers:
- meta-reference
telemetry:
provider_id: meta-reference
config: {}
routing_table:
inference:
- provider_id: remote::ollama
config:
host: 127.0.0.1
port: 11434
routing_key: Llama3.1-8B-Instruct
safety:
- provider_id: meta-reference
config:
llama_guard_shield:
model: Llama-Guard-3-8B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield:
model: Prompt-Guard-86M
routing_key: llama_guard
- provider_id: meta-reference
config:
llama_guard_shield:
model: Llama-Guard-3-8B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield:
model: Prompt-Guard-86M
routing_key: code_scanner_guard
- provider_id: meta-reference
config:
llama_guard_shield:
model: Llama-Guard-3-8B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield:
model: Prompt-Guard-86M
routing_key: injection_shield
- provider_id: meta-reference
config:
llama_guard_shield:
model: Llama-Guard-3-8B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield:
model: Prompt-Guard-86M
routing_key: jailbreak_shield
memory:
- provider_id: meta-reference
config: {}
routing_key: vector
I think instead of adding an ollama safety adapter, you can just add these two models into the ollama models map here: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/ollama/ollama.py#L29
Then, it should all work IMO.
Sadly I got the same error:(
@frieda-huang you should add another routing key to your inference
routing_table:
inference:
- provider_id: remote::ollama
config:
host: 192.168.88.18
port: 11434
routing_key: Llama3.1-8B-Instruct
- provider_id: remote::ollama
config:
host: 192.168.88.18
port: 11434
routing_key: Llama-Guard-3-8B
and have that extra mapping in ollama for llama guard.
(We know this is way too complicated and are working on making it very simple.)
@frieda-huang you should add another routing key to your inference
routing_table: inference: - provider_id: remote::ollama config: host: 192.168.88.18 port: 11434 routing_key: Llama3.1-8B-Instruct - provider_id: remote::ollama config: host: 192.168.88.18 port: 11434 routing_key: Llama-Guard-3-8B
and have that extra mapping in ollama for llama guard.
(We know this is way too complicated and are working on making it very simple.)
Thank you! This solves the problem!
@frieda-huang could you please share your config because it doesn't work with me
First the default LLama-Guard-3-1b
which is not correct.
The second error still ValueError: Could not find provider for Llama-Guard-3-8B
Error after modifying the provider
Traceback (most recent call last):
File "~/miniconda3/envs/stack/bin/llama", line 8, in <module>
sys.exit(main())
File "~/local/llama-stack/llama_stack/cli/llama.py", line 44, in main
parser.run(args)
File "~/local/llama-stack/llama_stack/cli/llama.py", line 38, in run
args.func(args)
File "~/local/llama-stack/llama_stack/cli/stack/run.py", line 79, in _run_stack_run_cmd
config = StackRunConfig(**yaml.safe_load(f))
File "~/miniconda3/envs/stack/lib/python3.10/site-packages/pydantic/main.py", line 212, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for StackRunConfig
routing_table.inference.0.provider_type
Field required [type=missing, input_value={'provider_id': 'remote::... 'Llama3.1-8B-Instruct'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing
routing_table.inference.1.provider_type
Field required [type=missing, input_value={'provider_id': 'remote::...ey': 'Llama-Guard-3-8B'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing
this is my config I run
"""version: v1 built_at: '2024-10-07T14:49:02.740858' image_name: ollama-local docker_image: null conda_env: ollama-local apis_to_serve:
""" OLLAMA_SUPPORTED_SKUS = { "Llama3.1-8B-Instruct": "llama3.1:8b-instruct-fp16", "Llama3.1-70B-Instruct": "llama3.1:70b-instruct-fp16", "Llama3.2-1B-Instruct": "llama3.2:1b-instruct-fp16", "Llama3.2-3B-Instruct": "llama3.2:3b-instruct-fp16", } """
First the default
LLama-Guard-3-1b
which is not correct.
Sure.
version: v1
conda_env: my-local-stack
apis_to_serve:
- safety
- inference
- memory_banks
- memory
- agents
- shields
- models
api_providers:
inference:
providers:
- remote::ollama
safety:
providers:
- meta-reference
agents:
provider_type: meta-reference
config:
persistence_store:
namespace: null
type: sqlite
db_path: /Users/friedahuang/.llama/runtime/kvstore.db
memory:
providers:
- meta-reference
telemetry:
provider_type: meta-reference
config: {}
routing_table:
inference:
- provider_type: remote::ollama
config:
host: 127.0.0.1
port: 11434
routing_key: Llama3.1-8B-Instruct
- provider_type: remote::ollama
config:
host: 127.0.0.1
port: 11434
routing_key: Llama-Guard-3-8B
safety:
- provider_type: meta-reference
config:
llama_guard_shield:
model: Llama-Guard-3-8B
excluded_categories: []
disable_input_check: false
disable_output_check: false
enable_prompt_guard: false
routing_key:
- llama_guard
- code_scanner_guard
- injection_shield
- jailbreak_shield
memory:
- provider_type: meta-reference
config: {}
routing_key: vector
ValueError: Routing key Llama-Guard-3-8B not found in map {'Llama3.1-8B-Instruct': 'llama3.1:8b-instruct-fp16', 'Llama3.1-70B-Instruct': 'llama3.1:70b-instruct-fp16', 'Llama3.2-1B-Instruct': 'llama3.2:1b-instruct-fp16', 'Llama3.2-3B-Instruct': 'llama3.2:3b-instruct-fp16'} Error occurred in script at line: 40
ll ~/.llama/checkpoints/Llama-Guard-3-8B/
.cache/ .gitattributes model-00001-of-00004.safetensors model-00004-of-00004.safetensors special_tokens_map.json USE_POLICY.md
config.json LICENSE model-00002-of-00004.safetensors model.safetensors.index.json tokenizer_config.json
generation_config.json llama_guard_3_figure.png model-00003-of-00004.safetensors README.md
>ollama run llama3.1:8b-instruct-fp16
ValueError: Routing key Llama-Guard-3-8B not found in map {'Llama3.1-8B-Instruct': 'llama3.1:8b-instruct-fp16', 'Llama3.1-70B-Instruct': 'llama3.1:70b-instruct-fp16', 'Llama3.2-1B-Instruct': 'llama3.2:1b-instruct-fp16', 'Llama3.2-3B-Instruct': 'llama3.2:3b-instruct-fp16'} Error occurred in script at line: 40
ll ~/.llama/checkpoints/Llama-Guard-3-8B/
.cache/ .gitattributes model-00001-of-00004.safetensors model-00004-of-00004.safetensors special_tokens_map.json USE_POLICY.md config.json LICENSE model-00002-of-00004.safetensors model.safetensors.index.json tokenizer_config.json generation_config.json llama_guard_3_figure.png model-00003-of-00004.safetensors README.md
>ollama run llama3.1:8b-instruct-fp16
Looks like the following mapping needs to be added to OLLAMA_SUPPORTED_SKUS
under llama_stack/providers/adapters/inference/ollama/ollama.py
:
"Llama-Guard-3-8B": "xe/llamaguard3:latest",
"llama_guard": "xe/llamaguard3:latest",
OLLAMA_SUPPORTED_SKUS = {
"Llama-Guard-3-8B": "xe/llamaguard3:latest",
"llama_guard": "xe/llamaguard3:latest",
"Llama3.1-8B-Instruct": "llama3.1:8b-instruct-fp16",
"Llama3.1-70B-Instruct": "llama3.1:70b-instruct-fp16",
"Llama3.2-1B-Instruct": "llama3.2:1b-instruct-fp16",
"Llama3.2-3B-Instruct": "llama3.2:3b-instruct-fp16",
}
I ran
python3 -m examples.agents.hello localhost 11434
and got the following errorError
my-local-stack-run.yaml