Llama 3 - Githubissues

CoteDave commented 1 month ago

Would be very usefull to add open source models like Llama3

AndreasKarasenko commented 1 month ago

If you're using Ollama it's already supported through the custom_url approach. If you want I can post a quick how to later.

CoteDave commented 1 month ago

Would be Nice to see the tutorial! Thanks !

AndreasKarasenko commented 1 month ago

Sorry for the delay. If you're running Ollama locally and have pulled some models you can use Scikit-LLM to interact with the localhost.

Load the packages

from skllm.datasets import get_classification_dataset
from skllm.models.gpt.classification.few_shot import FewShotGPTClassifier
from skllm.config import SKLLMConfig

Set the url to your Ollama server. By default localhost on port 11434. v1 is the OpenAI compatible endpoint.

SKLLMConfig.set_gpt_url("http://localhost:11434/v1/")

Load data, create a classifier, fit and test it.

X, y = get_classification_dataset()
clf = FewShotGPTClassifier(model="custom_url::llama3", key="ollama")
clf.fit(X,y)
labels = clf.predict(X, num_workers=2) # num_workers are the number of parallel requests sent

Notes

key and org are technically not needed but expected by Scikit-LLM, simply pass ollama or a random string. You can always ommit org.

num_workers is supported by default for Ollama as well, however you need to configure the server accordingly:

export OLLAMA_MAX_LOADED_MODELS=2 # sets the max number of loaded models
export OLLAMA_NUM_PARALLEL=2 # sets the max number of parallel tasks

If you downloaded model with ollama like so ollama pull llama3:8b, make sure to also use that name when creating the classifier:
```
clf = FewShotGPTClassifier(model="custom_url::llama3:8b", key="ollama")
```
This approach works for FewShotGPTClassifier, ZeroShotGPTClassifier, their MultiLabel counterparts, and should work for GPTSummarizer, GPTTranslator, GPTExplainableNER (I have not tested these).
It should work work for DynamicFewShotGPTClassifier because of a recent fix by Ollama that now supports embeddings in the v1 endpoint, see here. Previously you had to set the above config to the api endpoint, which then clashed with the actual classification.

Additional info

The v1 endpoint does not support passing additional information to the server, such as context size and temperature. This may be a problem, since e.g. the context size is by default 2048. The Ollama team is actively working on a fix though.
Because DynamicFewShotGPTClassifier had no native support until recently and the missing options I adapted Scikit-LLM to work natively with Ollama and published it as a packge that depends on Scikit-LLM. You can find it here or on PyPI. Sorry for the self-advertising.

CoteDave commented 1 month ago

Thank you for the great explaination !

It would be nice to add the functionnality to simply load any open source LLM model directly by just setting the path directory where the model have been downloaded or by using an huggingface link without any key.

Le ven. 26 juill. 2024 06:20, AndreasKarasenko @.***> a écrit :

Sorry for the delay. If you're running Ollama locally and have pulled some models you can use Scikit-LLM to interact with the localhost.

Load the packages

from skllm.datasets import get_classification_datasetfrom skllm.models.gpt.classification.few_shot import FewShotGPTClassifierfrom skllm.config import SKLLMConfig

Set the url to your Ollama server. By default localhost on port 11434. v1 is the OpenAI compatible endpoint.

SKLLMConfig.set_gpt_url("http://localhost:11434/v1/")

Load data, create a classifier, fit and test it.

X, y = get_classification_dataset()clf = FewShotGPTClassifier(model="custom_url::llama3", key="ollama")clf.fit(X,y)labels = clf.predict(X, num_workers=2) # num_workers are the number of parallel requests sent

Notes

key and org are technically not needed but expected by Scikit-LLM, simply pass ollama or a random string. You can always ommit org.

num_workers is supported by default for Ollama as well, however you need to configure the server accordingly:

export OLLAMA_MAX_LOADED_MODELS=2 # sets the max number of loaded modelsexport OLLAMA_NUM_PARALLEL=2 # sets the max number of parallel tasks

This approach works for FewShotGPTClassifier, ZeroShotGPTClassifier, their MultiLabel counterparts, and should work for GPTSummarizer, GPTTranslator, GPTExplainableNER (I have not tested these).

It should work work for DynamicFewShotGPTClassifier because of a recent fix by Ollama that now supports embeddings in the v1 endpoint, see here https://github.com/ollama/ollama/issues/2416. Previously you had to set the above config to the api endpoint, which then clashed with the actual classification.

Additional info

The v1 endpoint does not support passing additional information to the server, such as context size and temperature. This may be a problem, since e.g. the context size is by default 2048. The Ollama team is actively working on a fix though.

Because DynamicFewShotGPTClassifier had no native support until recently and the missing options I adapted Scikit-LLM to work natively with Ollama and published it as a packge that depends on Scikit-LLM. You can find it here https://github.com/AndreasKarasenko/scikit-ollama or on PyPI https://pypi.org/project/scikit-ollama/. Sorry for the self-advertising.

— Reply to this email directly, view it on GitHub https://github.com/iryna-kondr/scikit-llm/issues/109#issuecomment-2252447224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL2DSSJOJZ7D7J2WFT4ZDXLZOIPFRAVCNFSM6AAAAABLO45TBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSGQ2DOMRSGQ . You are receiving this because you authored the thread.Message ID: @.***>

AndreasKarasenko commented 1 month ago

The maintainers of Scikit-LLM plan to offer native llama-cpp support, which will include loading models (similar to the current gpt4all implementation). You can also check out the discussion on their Discord.

In Ollama's case managing models is quite easy. E.g. ollama pull llama3 pulls the default llama3 instance and makes it available to the server. You don't need to specify paths, keys or anything with that. Or if you do ollama pull llama2 you can use llama2 instead with clf = FewShotGPTClassifier(model="custom_url::llama2", key="literally_anything").

OKUA1 commented 1 month ago

Hi @CoteDave,

As @AndreasKarasenko already outlined, there are already multiple ways to use scikit-llm with local models either by running and OpenAI compatible web-server or using gpt4all backend that automatically handles model downloads.

However, scikit-llm is not compatible with the latest gpt4all versions and this backend will be replaced with llama_cpp in the following days. But the overall concept is going to be the same: a user provides the model name and it is downloaded automatically if not present.

We might investigate other options in the future, but overall would prefer to keep the model management outside of scikit-llm as much as possible.

iryna-kondr / scikit-llm

Llama 3 #109

Notes

Additional info