Description: Clean and preprocess the collected issue data to prepare it for training and testing. This includes extracting relevant information, performing text preprocessing, and organizing the data in a structured format for easy access during model training and evaluation.
Extract relevant fields from each issue (e.g., issue ID, title, description, assignee).
Filter the issues to keep only the ones with one assignee.
Remove issues assigned to developers with a minimal number of assignments to simplify the model's learning process.
Perform text preprocessing on issue titles and descriptions, including:
Stopwords removal
Punctuation removal
Identifiers splitting
Create a folder structure for storing the preprocessed training and test datasets.
Save the cleaned and preprocessed data in separate files for the training and test sets.