Description: Clean and preprocess the collected issue data to prepare it for training and testing. This includes extracting relevant information, performing text preprocessing, and organizing the data in a structured format for easy access during model training and evaluation.
Tasks:
Extract relevant fields from each issue (e.g., issue ID, title, description, assignee).
Filter the issues to keep only the ones with one assignee.
Remove issues assigned to developers with a minimal number of assignments to simplify the model's learning process.
Perform text preprocessing on issue titles and descriptions, including:
Stopwords removal
Stemming
Punctuation removal
Identifiers splitting
Create a folder structure for storing the preprocessed training and test datasets.
Save the cleaned and preprocessed data in separate files for the training and test sets.