Closed jeffxtang closed 2 months ago
This would be enormously helpful! We have several Llama models deployed via vLLM on-prem and it would be great to be able to point at those instead of spinning up another instance.
Hey @jeffxtang! podcast_transcript.py and inflation.py in examples/scripts/
serve as our current example scripts.
If you set INFERENCE_HOST
and INFERENCE_PORT
to your cloud-hosted inference server, it should work. Is that what you're asking?
Hi @dltn yes that is - I tried to port the example scripts to Colab here but it seems more work than just changing host and port. I copied the CustomTool implementation used in multi_turn.py, modified run_main a bit, and created the inference config file (not sure if checkpoint_dir and tokenizer_path pointing to local paths would work) but got an error:
ConfigAttributeError: Key 'agentic_system_config' is not in struct
full_key: agentic_system_config
object_type=dict
Can the Colab above be easily modified to make it work against a cloud Llama 3.1 provider (example of direct Llama calls shown at the end of the Colab)?
Can this also be adapted to use a Llama-3 model instance hosted on Azure?
Hi @jeffxtang, as of 0.0.4 we've added Ollama support which should unlock some new options for you. Longer term partnerships are on our roadmap, so stay tuned! Please reopen if there's anything we can help with in the short-term.
If I run
pip install llama-agentic-system
in a Colab notebook, and use a cloud hosted Llama 3.1 as the inference server (see a complete script below), is there an example of how to use llama-agentic-system? Canrun_main
in multi_turn.py be modified to do so? Will the custom_tools be included in the llama-agentic-system package later?