Feature request: support for the smallest reasonable codegen model

jbellis commented 7 months ago

I want to build a local Copilot with JLama but generalist models are too big and slow.

Three candidates I found: replit-code-v1_5-3b:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.MPT

codegen-2B-multi:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.CODEGEN

WizardCoder-1B-V1.0 (using the safetensors branch):

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@693fe6c9): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.GPT_BIGCODE

tjake commented 7 months ago

Did you find safetensor versions of the first two?

tjake commented 7 months ago

This one looks easy to add https://huggingface.co/stabilityai/stable-code-3b

Would that be ok for you?

jbellis commented 7 months ago

Sure!

On Mon, Feb 19, 2024, 9:57 PM Jake Luciani @.***> wrote:

This one looks easy to add https://huggingface.co/stabilityai/stable-code-3b

Would that be ok for you?

— Reply to this email directly, view it on GitHub https://github.com/tjake/Jlama/issues/17#issuecomment-1953449645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKJLTSW5XNXGRTMSQCNH3YUQNLFAVCNFSM6AAAAABDPWKNXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJTGQ2DSNRUGU . You are receiving this because you authored the thread.Message ID: @.***>

tjake commented 7 months ago

Ok, I added missing bits for https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base

Includes:

Ability to download from a branch
RoPE Scaling (linear only)
Legacy tokenizer support (PITA)

#download model
./run-cli.sh download -b "refs/pr/3" deepseek-ai/deepseek-coder-1.3b-base

#run with full weights
./run-cli.sh complete -t 0.3 -wq F32 -p "#write a quicksort algorithm in python" models/deepseek-coder-1.3b-base/
1 compiler directives added
WARNING: Using incubator modules: jdk.incubator.vector
22:51:19.859 [main] INFO  c.g.tjake.jlama.model.AbstractModel - Working memory type = F32, Quantized memory type = F32
#write a quicksort algorithm in python

def quick_sort(array):
    if len(array) < 2:
        return array
    else:
        pivot = array[0]
        less = [i for i in array[1:] if i <= pivot]
        greater = [i for i in array[1:] if i > pivot]
        return quick_sort(less) + [pivot] + quick_sort(greater)

print(quick_sort([10, 5, 2, 3]))

elapsed: 19s, 165.198273ms per token

#Run with quantized weights
./run-cli.sh complete -t 0.3 -q Q4 -wq I8 -p "#write a quicksort algorithm in python" models/deepseek-coder-1.3b-base/
1 compiler directives added
WARNING: Using incubator modules: jdk.incubator.vector
22:52:37.493 [main] INFO  c.g.tjake.jlama.model.AbstractModel - Working memory type = F32, Quantized memory type = I8
22:52:37.820 [main] INFO  c.g.t.jlama.model.llama.LlamaModel - Quantizing model with Q4 - Please hold...
#write a quicksort algorithm in python

def quicksort(arr):
    if len(arr)<=1:
        return arr
    else:
        pivot=arr.pop()
        greater=[]
        less=[]
        for i in arr:
            if i>pivot:
                greater.append(i)
            else:
                less.append(i)
        return quicksort(less)+[pivot]+quicksort(greater)

print(quicksort([1,4,2,6,9,5]))

elapsed: 5s, 43.271999ms per token

tjake / Jlama

Feature request: support for the smallest reasonable codegen model #17