Reproducing Results - Githubissues

hkust-zhiyao / RTL-Coder

A new LLM solution for RTL code generation, achieving state-of-the-art performance in non-commercial solutions and outperforming GPT-3.5.

101 stars 9 forks source link

Hi, Thanks for your interest! Regarding your first question, I guess you are referring to the functionality score of GPT4 on RTLLM1.1. During our experiments, we found that GPT4's generated results for the prompt in rtllm1.1 often contain code that is not clean, with uncommented extra contents interspersed within the code, which can lower the code's pass rate. Therefore, we manually removed irrelevant content from its generated results. As for your second question, we generated code for each score (pass@1, 5, 10) using three different temperatures {0.2, 0.5, 0.8}, and then selected the best pass rate among the three temperature configurations for the corresponding score.

Hope this can help you.

hkust-zhiyao / RTL-Coder

Reproducing Results #10