Closed traycn closed 5 months ago
Per some brainstorming I did with @Skydodle, I'm listing some general thoughts I have about loading data.
WIP
I was looking at the data being displayed on the map, 1 month of data is probably the most amount of data you can visually comprehend on the map before it becomes too much. We probably can get away with only loading 1 month of data at a time, and this will keep load times relatively quick.
I propose that we limit the app to only allow the user to see 1 month of data at a given time. I think there are 2 ways that we can work with month-sized datasets...
continuing to write notes as we discuss... I'm aware that loading data when user asks for it is the main concern that @traycn mentioned above. We'll investigate whether or not we can find workarounds for this.
Updated the ticket with new Action Items from what was discussed in todays meeting. I've also added notes on how the data is loaded in the application.
Hi, @traycn, please update the overview to include "why" we are pulling the data. Formulation: "We need to do X for Y reason." Please provide clarification for the two proof of concept items in the action items step. Are the currently listed two options to pull the data? Thank you!
@Skydodle and I have reviewed this one more time, and we're going to move ahead with the Action steps that are present. We will have a check-in for the 1st action step so that the team can see what happens when we load multiple repos of data. We will then proceed with the 2nd action step once we've reviewed and discussed as a team.
@Skydodle handing off to you, please assign yourself when ready.
When you're finished with Action Item 1, please post your branch name. Please also show any changes you make to the Hugging Face repo.
Follow up ticket for @Skydodle: https://github.com/hackforla/311-data/issues/1714
Overview
We need to pull data from 2023, 2022, etc. to show data from the previous year in our application for users to make more extensive searches.
At this time, the site is limited to display data of the current year to date.
Action Items
For the Proof of Concept that we can query multiple files: DuckDB pull multiple parquets docs - https://duckdb.org/docs/data/multiple_files/overview.html
newDb
instance to pull data from another311-data/[year searched here]
repoloc: components/db/DbProvider.jsx
loc: components/Map/index.js
For the Proof of Concept that we can make a query when a user makes search: loc: components/Map/index.js
SetData() //??
More Information:
The following is rough runthrough of the control flow for how data is currently being populated.
Step 1: A Parquet of the LA Open Data - 311 Call's is populated in the HuggingFace repo
https://huggingface.co/datasets/311-data/2024
Step 2: The HuggingFace repo is defined in the datasets.parquet.hfYtd value
Step 3: The
datasets.parquet.hfYtd
value is used to register a new FileStep 5: The DbContext (later used as
this.context
) is defined and passed to the applicationStep 6: The Data is queried and set to the front-end application
Previous Notes
1 - The parquets are in separate huggingface repos so, I’m not sure if we can query multiple files as shown in the duckdb doc [here](https://duckdb.org/docs/data/multiple_files/overview.html). … A potential solution would be putting the parquet files in a single repo (but consider the limitations of huggingface repos, doc [here](https://huggingface.co/docs/hub/repositories-recommendations). 2 - We may have to make a GET call in order for this to work and I’m not sure if we have the capabilities to run a GET call after the application loads. … Note: My understanding of how data is pulled is that it’s pulled once, ...... at the beginning when the application loads through a duckdb `initialize()` (loc: components/db/DbProvider.jsx line: 88) ...... and set in the `Resources/Instructions
DuckDB docs - https://duckdb.org/docs/api/wasm/overview DuckDB pull multiple parquets docs - https://duckdb.org/docs/data/multiple_files/overview.html Huggingface repo limitations - https://huggingface.co/docs/hub/repositories-recommendations