Filtering, Clean, Preprocess, and Organize Collected Data

Description: Clean and preprocess the collected issue data to prepare it for training and testing. This includes extracting relevant information, performing text preprocessing, and organizing the data in a structured format for easy access during model training and evaluation.
- Tasks:
  - Extract relevant fields from each issue (e.g., issue ID, title, description, assignee).
  - Filter the issues to keep only the ones with one assignee.
  - Remove issues assigned to developers with a minimal number of assignments to simplify the model's learning process.
  - Perform text preprocessing on issue titles and descriptions, including:
  - Stopwords removal
  - Stemming
  - Punctuation removal
  - Identifiers splitting
  - Create a folder structure for storing the preprocessed training and test datasets.
  - Save the cleaned and preprocessed data in separate files for the training and test sets.

lorenzovarese / automated-bug-triaging