Ollama Num GPU option not consistent with allowed values

rothnic commented 3 months ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.8

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Setup:

Setup ollama (did this on a windows machine) and pulled llama3 and phi3 and tested models locally that running prompts with ollama run
or through the api without any additional options passed that the model was fully loaded on the gpu
- You can check if it was loaded by inspecting the ollama logs and seeing how many layers were loaded, which was 33/33 layers in my case
Add ollama model to dify (llama3 and phi3 used in my case)
Create a simple chatbot using the ollama model

Issue:

At this point, I'm not sure why it felt like i was having trouble, but I though that I left the options unset and it wasn't being fully loaded on the gpu and was running slowly (i might have jumped straight to setting Num GPU to 1)
So, i looked at the options and saw num_gpu, which seemed to suggest that setting to 1 would mean use the gpu or to control how many gpus are used
However, after setting to 1, this is when i looked at the logs and found that only 1/33 layers were being loaded on the gpu
Next, i tried setting that value to 33, but the user interface doesn't support that
Next, i disabled that option in the dify chatbot for the llama3 model, clicked publish, clicked restart, then things started working as expected

✔️ Expected Behavior

I'd expect for the option to support the range of values that ollama allows. It currently is allowing values between 0 and 1. Ollama's code suggests it should only be allowed as integer values and should be the number of layers, which suggests any integer value up to the number of layers in the model (which might not be known, so should not be limited to any max integer value)
The user interface should only support integers. The user interface in Dify seems to be configured to support that option as an integer, but the input component seems to not limit to only integer values.

❌ Actual Behavior

Num GPU doesn't provide as much explanation as it could, which is understandable given that ollama doesn't have an api docs page that describes this completely. The note about use on macs kind of makes things confusing given that the tooltip and the name of the option sounds like setting to 1 enables using 1 gpu.
The input allows setting non integer values
The input doesn't allow setting the number of layers to send to the gpu, since it is limited to a max of 1

crazywoola commented 3 months ago

Will take a look at this tmr.

rothnic commented 3 months ago

Will take a look at this tmr.

Thanks! IMO, it might be worth mentioning in the tooltips any default behavior. In this case, you might not know that by not setting the value (i assume on a non-mac), then you enable all layers to use the gpu. I just imagine it would be confusing for most people to see a toogle option called "Use GPU" and not want to enable it, because it sounds like it toggles gpu use on.

It is a poorly named option from the ollama side of things, so another thought would be to change the option label in the Dify UI to "GPU Layers" and maybe have the tooltip reference the official parameter name.

langgenius / dify