Open akhaleghi opened 1 year ago
I made this repo for @chelseybeck to see if its feasable to use Jupyter Notebook with ghpages https://github.com/hackforla/jupyter-ghpages-test
I am going to create another repo for the 311 data to go into
Outline of Data Cleaning Steps Data Cleaning was essential to prepare the 311 service request data for analysis. The following steps were undertaken:
1. Removing Duplicates
data.drop_duplicates(inplace=True)
to eliminate duplicate rows.Reason: Duplicates can lead to biased results in analysis and modeling by over-representing certain data points.
2. Identifying Missing Values
data.isnull().sum()
.3. Converting Date Columns
CreatedDate
, UpdatedDate
, ServiceDate
, and ClosedDate
to datetime format using pd.to_datetime()
.4. Analyzing Categorical Variables
CD
& CDMember
, and NC
& NCName
.5. Dropping Unnecessary Columns
SRNumber
, MobileOS
, and others using data.drop(columns=unnecessary_columns, inplace=True)
.6. Standardizing Categorical Data
data[cat_columns] = data[cat_columns].apply(lambda x: x.str.lower())
.7. Handling Missing Data
ServiceDate
and ClosedDate
based on Status
and UpdatedDate
.8. Cleaning Zipcode column
ZipCode
column.9. Saving Cleaned Data
CreatedDate
.@bonniewolfe: @mru-hub is asking for clarification on this issue. Do we have a github page already for Hack for LA? Should she create a new page or add her work here https://github.com/hackforla/311-data-jupyter-notebooks? Also she mentioned "We have one for our organization which is created by Bonnie. Also the project page in above URL has '311-data', so i think we have one project page for our repository too. If this is true I have to use the same URL for current ghpage purpose."
I answered this in the data science meeting on 2024-09-16. Basically, the repository is the work for this issue, but it needs updated data files.
Started working on ghpages. Website: https://hackforla.github.io/311-data-jupyter-notebooks/lab (navigate to folder: 311_Data_CleaningScript). I've made some initial updates to the script and will continue working on integrating it for the ghpages.
Overview
We want to download 311 data and split by year, then month, so each is under 100MB and we can host tan append-only data warehouse on GitHub.
Action Items
Resources/Instructions
Cleaning Rules: https://github.com/hackforla/data-science/blob/main/311-data/CSV_files/Docs/CleaningRules.txt City Data:: https://data.lacity.org/browse?q=311%20data%20%2C%202024&sortBy=relevance (Please update the filter for the year 2024 based on the requirements.) Website (ghpages): https://hackforla.github.io/311-data-jupyter-notebooks/lab (navigate to folder : 311_Data_CleaningScript)