Open Aayush121202 opened 2 months ago
The problem is changed to fixing errors in code using seq2seq model. Reference papers:
The team is supposed to add the following details:
Sir, We have changed our topic as per your suggestion:
New Problem Statement: Software Code Bug-Fixing through Unsupervised Learning and Error Detection
Project Explanation: The project centers on using the Break-It-Fix-It (BIFI) algorithm, which automates the cycle of breaking and fixing code. The algorithm consists of 2 main parts- the fixer and the critic. The fixer is a model trained to repair synthetically corrupted code and gradually improves through feedback from the critic—a compiler or code analyzer. The critic evaluates the fixer's output, determining whether the repaired code is error-free. Over time, this process allows the fixer to learn from real-world errors, becoming more accurate in repairing code without needing labeled data. This innovative method enhances automated bug fixing and offers practical applications in software development and education.
Evaluation Strategy: We will evaluate the model based on repair accuracy, calculated as the proportion of successfully repaired code snippets that compile without errors. The model's performance will also be assessed across specific error types (e.g., syntax errors, indentation errors) using F1 scores for each class.
Dataset : The project will utilize two main datasets as referenced in the paper: GitHub-Python: A dataset of 3 million Python code snippets, with 38,000 examples of bad code. DeepFix: A dataset containing C code submitted by students, consisting of 7,000 bad examples and 37,000 good examples. The dataset consisting of good code and bad code is available here- https://worksheets.codalab.org/bundles/0x5eb0135755464c66bf3c398f43f634e0 The dataset is too large, so we will need to extract limited codes from the dataset to train the model.
Resources : Research Paper : Break-It-Fix-It: Unsupervised Learning for Program Repair https://arxiv.org/pdf/2106.06600
We look forward to your feedback.
Looks good. Marking this as approved.
Title
Generative AI based Software Metadata Classification
Team Name
Data Pirates
Email
202101452@daiict.ac.in
Team Member 1 Name
Aayush Patel
Team Member 1 Id
202101452
Team Member 2 Name
Pranav Patel
Team Member 2 Id
202103040
Team Member 3 Name
Vatsal Shah
Team Member 3 Id
202103022
Team Member 4 Name
Kalp Shah
Team Member 4 Id
202103003
Category
Evaluation Track Problem
Problem Statement
A binary code comment quality classification model needs to be augmented with generated code and comment pairs that can improve the accuracy of the model.
Evaluation Strategy
Evaluated based on the % increase in F1 score from baseline and the quality of data generated.
Dataset
Seed Dataset: A dataset of code and comment pairs- https://drive.google.com/file/d/17caOWv0F_0W7q9IHnMP_uGEuzSs1am0h/view
Resources
Paper Title- Software Metadata Classification based on Generative Artificial Intelligence Paper Link- https://arxiv.org/pdf/2310.13006