mlsanigeria / nigeria-crime-trends

MIT License
3 stars 3 forks source link

Clean and Preprocess the Dataset ๐Ÿงน๐Ÿ“ƒ #2

Open Nalito opened 1 day ago

Nalito commented 1 day ago

Clean and Preprocess Crime Dataset

Description: Perform data cleaning, including removing duplicates, handling missing values, and converting date formats. Preprocess the data to create features like crime types, regions, and time-based metrics. Labels: Data Cleaning, Data Preprocessing

What is Needed

Contributors are needed to perform data cleaning, including removing duplicates, handling missing values, and converting date formats.

How to Contribute

Getting Started

Before you begin, ensure you have read the Contribution Guidelines in the repository

We are excited to see your contributions! Happy Hacking! ๐ŸŽ‰

yoenuts commented 1 day ago

Hello! Im interested in working on this issue and can contribute especially with Data Cleaning. Is there a deadline for this?

Odeyiany2 commented 1 day ago

Hello! Im interested in working on this issue and can contribute especially with Data Cleaning. Is there a deadline for this?

Hello @yoenuts, Thank you for your interest! Contribution to the project starts on Tuesday 1st, October to kickoff the hacktoberfest event but you can start contributing now.

Please follow our contribution guidelines Create a folder using the project name and your github name under this folder in the repo. It would contain your notebook.

We also encourage you to register for our kickoff call to get firsthand information on what we expect from your contributions: MLSA Hacktoberfest

Weโ€™re looking forward to your participation!๐ŸŽ‰

faresbouzayen commented 1 day ago

Hi @Nalito! Iโ€™m eager to contribute to the cleaning and preprocessing of the crime dataset! ๐Ÿงน๐Ÿ“ƒ

Plan: Data Cleaning:

Remove Duplicates: Identify and eliminate any duplicate entries. Handle Missing Values: Determine appropriate strategies for dealing with missing data (e.g., imputation, removal). Convert Date Formats: Ensure all date fields are in a standardized format for consistency. Feature Engineering:

Crime Types: Categorize crimes into defined types for easier analysis. Regions: Create region-based metrics for spatial analysis. Time-Based Metrics: Generate features that capture temporal trends (e.g., crime rates over time). Next Steps: Iโ€™ll review the existing scripts and documentation to understand the current setup and ensure my contributions align with the project's structure. Iโ€™ll also make sure to test my code thoroughly before submitting a pull request. If there are any specific guidelines or additional details youโ€™d like me to follow, please let me know. Iโ€™m looking forward to collaborating with everyone! ๐ŸŽ‰