hackforla / data-science

The Hack For LA Data Science team is a Community of Practice within the LA brigade seeking to make analytical and machine learning services available to local communities and organizations.
28 stars 17 forks source link

CoP: Data Science: Create district types reusable tool (API, single dataset, etc.) #118

Open ExperimentsInHonesty opened 3 years ago

ExperimentsInHonesty commented 3 years ago

Overview

We need to create a tool so that each project at H4LA that renders points on a map can use District Files to help people analyze or view the data.

Action Items

Resources

Example Neighborhood Council Shape File

Initial Identification of Large Groups/Districts

ExperimentsInHonesty commented 2 years ago

create a npm package for delivering the data. We need to get a backend person involved and we need to make one for each time they change, so la-shape-files-2021, la-shape-files-2022

ExperimentsInHonesty commented 2 years ago

next steps are talking to 311 team, tdm team, food oasis, luckparking

akhaleghi commented 2 years ago

Feedback from Mike Morgan on 12/9: Since the shape files for the various districts are small enough (less than 50MB, see here), they can be stored in a repository. We should also consider making these available as npm and GeoJSON.

akhaleghi commented 2 years ago

Notes from 3/11 meeting with Abe, Bonnie, John (Food Oasis) and Mike:

Food Oasis uses PostGRES DB's own geometry data type to run scripts, and then converts to geojson to send to client.

PostGRES can also consume geojson to convert to its proprietary geometric data type.

The recording of the meeting

ExperimentsInHonesty commented 8 months ago

This issue will have to get re-written to check and see if the shape files are out of date. But the programming using the shape files, should be built first, given that up to date shape files, with no programming is useless.

akhaleghi commented 8 months ago

Next steps: Create a script that can be run to automate downloading the various shape files from the various district types listed above. We will want to note the data the files was last updated and the date the file was downloaded.

parcheesime commented 7 months ago

Update on issue #118, district types reusable tool:

parcheesime commented 6 months ago

Using the GeoHub L.A. website I programmatically created shape files: Data Acquisition: Utilizing the GeoHub LA website, I identified and accessed URL endpoints for the API calls corresponding to our project's requirements. Data Extraction: Through programmatic queries, I fetched JSON data from the different district API endpoints, capturing geographical information such as boundaries, points of interest, and administrative divisions. Shapefile Creation: Using the gathered JSON data, I made shapefiles, the geospatial data format compatible with various GIS software and tools. Compression Exploration: To optimize storage and handling of the shapefiles, I'm trying out compressing the data using TruncatedSVD.

parcheesime commented 6 months ago

Update: Data Acquisition, Extraction, Shapefile, and compression exploration can be accessed in my repo, HERE

This week I will look into how we can run the data collection script on a quarterly basis and have it collect in Google Drive and/or GitHub, or whatever is best for the team.

parcheesime commented 6 months ago

Here's an update on data acquisition and extraction of district shape files:

Update on the Shape File Automation Project

Consideration:

Next Steps:

I've also pushed all recent updates to the repository, and you can check the latest commits for detailed changes.

parcheesime commented 6 months ago

Project Update:

I can adjust the code to update a GitHub folder. We can do both Google Drive and GitHub, need be.

parcheesime commented 5 months ago

This week I refined the setup of environment variables to enhance both local development and CI/CD workflows in GitHub Actions. By leveraging os.getenv() for securely accessing environment variables I've streamlined the development process significantly. This ensures that applications run smoothly with the necessary configurations without hardcoding sensitive information.

Additionally, I've discussed with our project manager about updating the top-level Google folder structure. This change aims to improve the automation process for storing shape files.

District Data Collection Repo

parcheesime commented 4 months ago

I gathered all the information for transferring my current repo with the District Shape File pipeline, into a new repo established in Hack for LA account for housing the shape data. Below are the steps involved. The transfer will be completed within the week. In the meantime, shape file data is in Hack for LA Google Drive.

Steps for Repository Transfer

The following steps have been determined for transferring the repository associated with the district data collection:

  1. Prepare New Repository

    • A new empty repository has been established to house the district data collection.
  2. ETL Process Completion

    • The ETL process has been completed in my current repository.
  3. Code Transfer Process

    • Clone the new repository locally.
    • Add the new repository as a remote to the existing project.
    • Pull the latest code from the current (old) repository.
    • Push the code to the new repository.
  4. Transfer Automation Components

    • Transfer GitHub Actions and secrets necessary for pipeline automation.
  5. Update Documentation

    • The README file will be updated to reflect changes and provide guidance for the new repository setup.
akhaleghi commented 1 month ago

@parcheesime Is there still work to be done on this issue or is it complete?

parcheesime commented 1 month ago

@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?

parcheesime commented 2 weeks ago

@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?

Follow-up: @akhaleghi I have the data updating on my personal repository. I will need assistance in adding my project to our data science repo. @salice may have made a one but it was awhile ago before the repository updates.