the-crypt-keeper / can-ai-code

Self-evaluating interview for AI coders
https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
MIT License
505 stars 28 forks source link

Add Minotaur on the board #21

Closed pabl-o-ce closed 1 year ago

pabl-o-ce commented 1 year ago

Hi guys love the work.

I have been testing TheBloke/minotaur-15B-GGML ans is pretty solid you can test TheBloke/minotaur-15B-GPTQ

the-crypt-keeper commented 1 year ago

The Gradio app for this one at https://huggingface.co/spaces/openaccess-ai-collective/rlhf-arena is GGML based and super slow, takes 30+ seconds to get a completion result.

Looks like it's USER/ASSISTANT style with a sorta weird system prompt: Below is a dialogue between a USER and an ASSISTANT. The USER may ask questions, request information, or provide instructions for a task, often supplementing with additional context. The ASSISTANT responds accurately and effectively, offering insights, answering questions, or executing tasks to the best of its ability based on the given information.

Going to give TheBloke's GPTQ and GGML quants a go as this model does look very interesting.

the-crypt-keeper commented 1 year ago

Looks like this model comes in 3 rather confusing flavors:

Minotaur-15B is based on StarChat Minotaur-13B is based on LLaMa (but had a problem with dataset) Minotaur-13B-fixed is also based on LLaMa

I have previously evaluated Minotaur-13B but that doesn't seem to even be valid anymore since that model has been superseded by a fix.

the-crypt-keeper commented 1 year ago

The GPTQ version seems to be missing a file required to load the tokenizer - opened a PR https://huggingface.co/TheBloke/minotaur-15B-GPTQ/discussions/1

the-crypt-keeper commented 1 year ago

Missing file has been added and model now works with the safer use_fast=False tokenizer

Some interesting answers out of this guy, it seems to have been excessively instruction-tuned:

Here's one way you can write this JavaScript function `substrCount`:

1. First, create a variable called `result` which will store the count of occurrences of the substring `substr`. Initialize it as zero (0).

2. Next, use the built-in method `indexOf()` to check if the string `str` contains the substring `substr`. If so, increment the value stored in `result` by 1.

3. Finally, return the final value of `result`, indicating how many times the substring occurred within the input string.

This will score a 0 on our test, since its not code.

the-crypt-keeper commented 1 year ago

15b eval has been added but I'm going to leave this issue open for now as I'd like to also try minotaur-13b-fixed.

pabl-o-ce commented 1 year ago

I have been testing TheBloke/minotaur-15B-GGML with this prompt you have prompts/Minotaur.txt using koboldcpp and is very solid answers using the prompt template and the config params that you have to generate questions.. I don't know if that helps on the testing procedure that you have.

the-crypt-keeper commented 1 year ago

That's very interesting because that prompt is meant for a totally different model, Minotaur-13B, which is based on Llama not StarCoder 🧐

I'll give it a run though and see what happens..

On Sun, Jun 25, 2023, 1:30 PM PΔBLØ ᄃΞ @.***> wrote:

I have been testing TheBloke/minotaur-15B-GGML with this prompt you have prompts/Minotaur.txt using koboldcpp and is very solid answers using the prompt template and the config params that you have to generate questions.. I don't know if that helps on the testing procedure that you have.

— Reply to this email directly, view it on GitHub https://github.com/the-crypt-keeper/can-ai-code/issues/21#issuecomment-1606173682, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUGCACCSEJZR3HS2NLJKJ7DXNBYUFANCNFSM6AAAAAAZSV4WDE . You are receiving this because you commented.Message ID: @.***>

pabl-o-ce commented 1 year ago

Llm are interesting! indeed.

Should I create new issues for Orca and the new Wizard uncensored created from faldore

Also I really like to test code questions... I experience whenever I have a bad answer.. I try to regenerate and sometimes give me some correct answer.. just to keep in mind as a fact I wanna share.

the-crypt-keeper commented 1 year ago

I did some work with Orca Mini family this weekend, it's sitting on a branch and I need to rerun the evals to be JS only but here's a fun little experiment: https://huggingface.co/spaces/mike-ravkine/orca-mini-coder-analysis

Falcoder is on my wishlist but nobody has made quants and I can't easily run it full precision.

On Sun, Jun 25, 2023, 11:51 PM PΔBLØ ᄃΞ @.***> wrote:

Llm are interesting! indeed.

Should I create new issues for Orca and the new Wizard uncensored created from faldore

Also I really like to test code questions... I experience whenever I have a bad answer.. I try to regenerate and sometimes give me some correct answer.. just to keep in mind as a fact I wanna share.

— Reply to this email directly, view it on GitHub https://github.com/the-crypt-keeper/can-ai-code/issues/21#issuecomment-1606549767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUGCACB6CZFOXU2AQRJ4COTXNEBNJANCNFSM6AAAAAAZSV4WDE . You are receiving this because you commented.Message ID: @.***>

pabl-o-ce commented 1 year ago

Falcoder is on my wishlist but nobody has made quants and I can't easily run it full precision.

Let me send the repo mrm8488/falcoder-7b to TheBloke if he can made it...

UPDATE: he is going to do it.

the-crypt-keeper commented 1 year ago

I am impatient, so I've done it myself - q5_0 uploading now: https://huggingface.co/mike-ravkine/falcoder-7b-GGML

I've tested with this GGML fork and it seems to work: https://github.com/jploski/ggml/tree/falcon40b

$ ./bin/falcon -m ~/ai/falcoder-7b-GGML/ggml-model-falcoder-7b-sharded-bf16-q5_0.bin --top_k 1 -p 'function fib('
main: seed = 1687870044
falcon_model_load: loading model from '/home/miner/ai/falcoder-7b-GGML/ggml-model-falcoder-7b-sharded-bf16-q5_0.bin' - please wait ...
falcon_model_load: n_vocab   = 65024
falcon_model_load: n_embd    = 4544
falcon_model_load: n_head    = 71
falcon_model_load: n_head_kv = 1
falcon_model_load: n_layer   = 32
falcon_model_load: ftype     = 2008
falcon_model_load: qntvr     = 2
falcon_model_load: ggml ctx size = 4764.96 MB
falcon_model_load: memory_size =    32.00 MB, n_mem = 65536
falcon_model_load: ........................ done
falcon_model_load: model size =  4732.91 MB / num tensors = 196
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: number of tokens in prompt = 3
main: token[0] =   5529, function
main: token[1] =  10045,  fib
main: token[2] =     19, (

function fib(n) {
  if (n === 0) {
    return 0
  }
  if (n === 1) {
    return 1
  }
  return fib(n-1) + fib(n-2)
}

console.log(fib(5)) // 8
console.log(fib(6)) // 13
console.log(fib(7)) // 21
console.log(fib(8)) // 28
console.log(fib(9)) // 34
console.log(fib(10)) // 55
console.log(fib(11)) // 78
console.log(fib(12)) // 121
console.log(fib(13)) // 169
console.log(fib(14)) // 233

main: mem per token =   424356 bytes
main:     load time =  1802.23 ms
main:   sample time =    97.24 ms
main:  predict time = 47944.04 ms / 237.35 ms per token
main:    total time = 50607.29 ms
the-crypt-keeper commented 1 year ago

A few initial thoughts:

main: number of tokens in prompt = 4
main: token[0] =    602, int
main: token[1] =   4278, []
main: token[2] =  10045,  fib
main: token[3] =     19, (

int[] fib(int n) {
        int a = 0, b = 1, c = 0;
        for (int i = 1; i < n; i++) {
                c = a + b;
                a = b;
                b = c;
        }
        return [a, b];
}

int main() {
        int n = 5;
        int[] fib = fib(n);
        for (int i = 0; i < fib.length; i++) {
                printf("%d ", fib[i]);
        }
        printf("\n");
        return 0;
}
### Solution:
int main() {
        int n = 5;
        int[] fib = fib(n);
        for (int i =
the-crypt-keeper commented 1 year ago

OK so following up on this, Minotaur-15B has been added (as per this original ticket)

Opened up #24 to track falcoder, its got some issues at first glance.

You might also be interested in #23 which also has unique challenges.