symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/
MIT License
57 stars 2 forks source link

Deal with dependencies requested by LLMs #174

Open ahumenberger opened 2 weeks ago

ahumenberger commented 2 weeks ago

Assume an LLM responds with something like

    package com.eval;

    import org.junit.jupiter.api.Test;
    import org.mockito.InMockito;

    import static org.junit.jupiter.api.Assertions.*;

    class PlainTest {

        @Test
        void testPlain() {
            InMockito.when(Plain::plain).thenReturn(null);
            Plain.plain();
            InMockito.verify(Plain::plain);
        }
    }

The test file needs mockito to run, and we need to make sure that this is available when executing the tests.

Response from custom-fireworks_accounts_fireworks_models_qwen2-72b-instruct

bauersimon commented 2 weeks ago

What's also bad is that we just go mod tidy all dependencies for Go... so LLMs can use whatever they want, but for Java everything except Java is forbidden.

ahumenberger commented 2 weeks ago

llama-3-70b-instruct responded with tests using hamcrest

    import org.junit.jupiter.api.Test;
    import static org.junit.jupiter.api.Assertions.*;
    import static org.hamcrest.CoreMatchers.is;
    import static org.hamcrest.CoreMatchers.equalTo;
    import static org.hamcrest.MatcherAssert.assertThat;