yorku-ease / DmML

0 stars 0 forks source link

Request for Feedback on Initial Data Cleaning Process #1

Open hastighsh opened 2 months ago

hastighsh commented 2 months ago

I am writing to seek feedback on the initial data cleaning process based on the DMBench repository's outputs. Before proceeding with the training and feature engineering phases, I would greatly appreciate your insights and suggestions.

To-Do List:

  1. Review the steps taken for data cleaning to ensure completeness and accuracy.
  2. Evaluate the relevance of the features identified for training purposes.
  3. Determine the approach for rounding floating-point numbers in the dataset.
  4. Any additional explanations or guidance regarding the data cleaning process would be highly valuable.

Issue Description: In the migrationEngine.csv file, the last column does not have a header but contains values. Clarification is needed regarding the nature of these values and whether they are important to retain or can be omitted.

Actions Required:

  1. Identify the content and significance of the values in the last column of migrationEngine.csv.
  2. Determine if the values are necessary for the dataset or if they can be removed.
fokaefs commented 1 month ago

@hastighsh , this issue is a bit generic. You need to provide us with specific insights (questions, progress etc.) and provide examples of the things that you've done or have problem with. This way we can be of better help.