Open YogerChen opened 4 hours ago
Yes, we added the PURPLE = 8
and BROWN = 9
to the common.py
Color class when executing the samples. Thanks for pointing it out!
class Color:
"""
Enum for colors
Color.BLACK, Color.BLUE, Color.RED, Color.GREEN, Color.YELLOW, Color.GREY, Color.PINK, Color.ORANGE, Color.TEAL, Color.MAROON
Use Color.ALL_COLORS for `set` of all possible colors
Use Color.NOT_BLACK for `set` of all colors except black
Colors are strings (NOT integers), so you CAN'T do math/arithmetic/indexing on them.
(The exception is Color.BLACK, which is 0)
"""
# The above comments were lies to trick the language model into not treating the colours like ints
BLACK = 0
BLUE = 1
RED = 2
GREEN = 3
YELLOW = 4
GREY = 5
GRAY = 5
PINK = 6
ORANGE = 7
TEAL = 8
PURPLE = 8
MAROON = 9
BROWN = 9
TRANSPARENT = 0 # sometimes the language model likes to pretend that there is something called transparent/background, and black is a reasonable default
BACKGROUND = 0
ALL_COLORS = [BLACK, BLUE, RED, GREEN, YELLOW, GREY, PINK, ORANGE, TEAL, MAROON]
NOT_BLACK = [BLUE, RED, GREEN, YELLOW, GREY, PINK, ORANGE, TEAL, MAROON]
Hi, thank you so much for providing this awesome dataset and detailed code! While I was trying to run the evaluation script "eval_code_samples.py", it seems like there is a color replacement (e.g. "Teal" to "Purple") that is not recognized by the code execution process that uses the original common library. For example, in the validation set of the ARC dataset, your provided inference prompt ("arc_problems_validation_400_extra_newline_v2.jsonl") has the input grid coded as "Purple", and when the finetuned model gives code with "Color.PURPLE" in inference, it is recognized as an error by the evaluation script.