kubeedge / ianvs

Distributed Synergy AI Benchmarking
https://ianvs.readthedocs.io
Apache License 2.0
103 stars 38 forks source link

Smart Coding benchmark suite: built on KubeEdge-lanvs #98

Open YangBrooksHan opened 1 month ago

YangBrooksHan commented 1 month ago

What would you like to be added/modified:

  1. Build a collaborative code intelligent agent alignment dataset for LLMs:
    • The dataset should include behavioral trajectories, feedback, and iterative processes of software engineers during development, as well as relevant code versions and annotation information.
    • The dataset should cover code scenarios of different programming languages, business domains, and complexities.
    • The dataset should comply with privacy protection and intellectual property requirements, providing good accessibility and usability.
  2. Design a code intelligent agent collaborative evaluation benchmark for LLMs:
    • The evaluation benchmark should include common tasks of code intelligent agents such as code generation, recommendation, and analysis.
    • Evaluation metrics should cover multiple dimensions including functionality, reliability, interpretability, etc., matching the feedback and requirements of software engineers.
    • The evaluation benchmark should assess the performance of LLMs in collaborative code intelligent agent tasks and provide a basis for further algorithm optimization.
  3. Integrate the dataset and evaluation benchmark into the KubeEdge-Ianvs framework:
    • Incorporate the dataset and evaluation benchmark as part of the Ianvs framework, providing good scalability and integrability.
    • Ensure that the dataset and evaluation benchmark can efficiently run on edge devices within the Ianvs framework and seamlessly collaborate with other functional modules of Ianvs.
    • Release an upgraded version of the Ianvs framework and promote it to developers and researchers in the fields of edge computing and AI.

By implementing this project, we aim to provide crucial datasets and evaluation benchmarks for the further development of LLMs in the field of code intelligent agents, promote efficient collaboration between LLMs and software engineers in edge computing environments, and drive innovation and application of edge intelligence technology

Why is this needed:

Large Language Models (LLMs) have demonstrated powerful capabilities in tasks such as code generation, automatic programming, and code analysis. However, these models are typically trained on generic code data and often fail to fully leverage the collaboration and feedback from software engineers in real-world scenarios. To construct a more intelligent and efficient code ecosystem, it is necessary to establish a collaborative code dataset and evaluation benchmark to facilitate tight collaboration between LLMs and software engineers. This project aims to build a collaborative code intelligent agent alignment dataset and evaluation benchmark for LLMs based on the open-source edge computing framework KubeEdge-Ianvs. This dataset will include behavioral trajectories, feedback, and iterative processes of software engineers during development, as well as relevant code versions and annotation information. Through this data, we will design evaluation metrics and benchmarks to measure the performance of LLMs in tasks such as code generation, recommendation, and analysis, fostering collaboration between LLMs and software engineers.

Recommended Skills: Proficiency in large language model fine-tuning Python programming skills Preferably a background in software engineering (familiarity with formal verification is a plus)

Useful links: https://www.swebench.com/

https://fine-grained-hallucination.github.io/

https://cloud.189.cn/t/36JV7fvyIv2q (访问码:evr9)

MooreZheng commented 1 month ago

If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.