the-crypt-keeper / can-ai-code

Self-evaluating interview for AI coders
https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
MIT License
515 stars 29 forks source link

Evaluate.py: code extraction doesn't extract code marked by ```js #5

Closed Maykeye closed 1 year ago

Maykeye commented 1 year ago

I tried to run on Guanaco-33b by modyfing interview-llamacpp.sh, and hit the issue with evaluator

Consider this

In [4]: gu=open("results/Guanaco-33b/FactorialZeroShot-javascript.txt").read()

In [5]: print(gu)
 When asked to write code, return only the final function enclosed in a ``` block.

### Instruction: Write a javascript function to compute the factorial of input n.

### Assistant: Here is an example implementation of a JavaScript function that computes the factorial of a non-negative integer `n`:
```js
function fact(n) {
  if (n === 0) {
    return 1;
  } else {
    return n * fact(n - 1);
  }
}

This function takes a single argument n, which represents the number for which you want to calculate the factorial. The function first checks if n is equal to 0. If it is, then the function returns 1, as the factorial of 0 is defined to be 1. Otherwise, the function calculates the factorial recursively by multiplying n with the result of calling the same function on n - 1.


If you call to extract code you will get
````python
In [7]: print(extract_code(gu))
block.

### Instruction: Write a javascript function to compute the factorial of input n.

### Assistant: Here is an example implementation of a JavaScript function that computes the factorial of a non-negative integer `n`:

There are two issues at play: 1) ``` is a part of the included prompt.

extract_code will always find it, because it always present.

So extract code firstly things that ''' in prompt is a start of the actual code block, then it mistakes start of the real codeblock for the end of the codeblock

2) extract_code doesn't support '''js. Only '''javascript E.g. in github Js alias is valid for markdown of javascript

So even if 1) were fixed, current extract_code wouldn't parse the answer correctly: it would return 'js\nfunction fact(n) {\n if (n === 0) {\n return 1;\n } else {\n return n * fact(n - 1);\n }\n}' without removing js tag

Maybe whole line with first ''' should be erased?

(On similar note, '''py seems to be valid language name for python, at least as github markdown engine is concerned)

(Also evaluate.py has no __name__ == __main__ guard), which makes importing it from ipython impossible without extracting the function first)

the-crypt-keeper commented 1 year ago

Thanks for the report! I'm with you on just dropping the whole first line idea, thinking switching from a list of hard-coded prefix to just ```(\w*) would be the most generic fix so we don't have to worry about any future permutations.

the-crypt-keeper commented 1 year ago

Thanks again @Maykeye I've pushed fixes to enable evaluate to be used as a library and for a language-agnostic code block start of search. Could you give it a try? I have opened a new issue #7 to track your first point that some of the interview executors are including the prompt in their output and that's undesirable.

Maykeye commented 1 year ago

Works fine for Guanaco-33b now. I also checked OpenAssistant_oasst-sft-1-pythia-12b, it only fails for

def fact(n):
    if n == 0:
        return 1
    else:
        return n * fact(n - 1)

Explanation:
The function takes an integer as an argument, n. If n is zero, it returns 1 since there is no factorial for zero. Otherwise, it recursively calls itself with n minus one as the argument until n is equal to 1, at which point the function returns the value of n multiplied by the result of the recursive call.

which is returned verbatim(here model ignores instruction about ``` and ignores instruction about not providing explanation), but that's separate issue and I'm not sure if evaluate.py should go extra miles after models that don't do what they asked to, original issue is solved