Open fbricon opened 4 weeks ago
Soon as we build the performance test against and get baselines we will fine-tune with these configurations and the settings that the granite team recommended. for code tasks, completionOptions": { "temperature": 0.2 or 0.3 (for higher precision, more deterministic) "topP": 0.9 or 1 "topK": 40 "presencePenalty": 0.0 "frequencyPenalty": 0.1 "stop": null, "maxTokens": (start small, test and expand) } e.g. start maxTokens at 2K or 3K , i.e. the maximum output length, that leaves plenty of room for inputs (over 120K+) but to minimize hallucination, we need to regulate both input and output size , and work to find a balance ... it all depends on the capability of the model
Continue supports fine-tuned model configuration.
We should be able to provide proper defaults for each model size. @jamescho72 can you help here?