Open dan-homebrew opened 2 months ago
@nguyenhoangthuan99 however there are some clarifications needed from Inference team. What is the stop token format?
<>
for the user?<|special_token|>
<|end_of_text|>
<|eom_id|>
Stop words of a model can be a list so I think we can make it like this
From technical aspect, I think there are 2 cases we can follow:
<model_id>.yaml
file, so Jan app just read from it to place it as default.<model_id>.yaml
, Jan app can read from <model_id>.yaml
to set as defaultFormat
Most of model introduce stop tokens with this format <{content}>
. The content
is different for each model arch. So I think we can predefine a list option of map model's arch : list stop words
like this for user to choses:
Maybe 5 or 6 popular models arch is enough and another option to let users input whatever they want (this feature may be only for power user or dev because normal user might only use default configuration)
Problem
<endofstring>
,<new_sentence>
,<this_is_the_end>
.Solution
From technical aspect, there are 2 cases:
Format
Most of model introduce stop tokens with this format <{content}> . The content is different for each model arch. So I think we can predefine a list option of map model's arch : list stop words like this for user to choses:
Maybe 5 or 6 popular models arch is enough and another option to let users input whatever they want (this feature may be only for power user or dev because normal user might only use default configuration)
Design
Figma: https://www.figma.com/design/DYfpMhf8qiSReKvYooBgDV/Jan-App-(3rd-version)?node-id=8281-97234&t=qb7yU8r2PAayVdNW-4
Default Stop Words:
Users should understand these are recommended for the model
Predefined Options:
Offer a dropdown preset or quick-select for common stop words based on model architecture
Custom/User-Added Stop Words:
Task