superstreamlabs / save-zakar-hackathon

Apache License 2.0
4 stars 11 forks source link

Project Submission #21

Open andrejakobsen opened 1 year ago

andrejakobsen commented 1 year ago

Project Name

Save Zakar by A-squared

Project Type

Project Description

Team member: Alessandra Oshiro

Architecture

The architecture consists of streaming the data from all three Memphis stations to a Supabase database that we connect to directly in our Streamlit app. We had plans of doing something much more sophisticated with dbt, Redis and Flink, but in the end, we couldn't justify over-engineering our project when the end goal is a good user experience. We found that the tools suggested by the organizers were sufficient for what we wanted to achieve. However, certain decisions were important for the app's performance. To decrease buffering and latency times, we've ensured to perform the heavy data transformations in the database and only load the data that's necessary as data frames in the app, which we then cache with Streamlit.

Dashboard

We spent the majority of the time working on the dashboard, as we knew it would be time-consuming, but at the same time with the highest potential ROI. We ended up using the Python wrapper for echarts in order to have the most compelling and visually appealing interactive charts. Some of them are even animated. Although it was very tedious to get the visualizations to behave the way we wanted, we believe it was really worth all the effort. Echarts allowed us to make Zakar feel like a real place by also visualizing the data on a map we had created. The map is intentionally squared and is supposed to illustrate what the island could look like. It is used for selecting specific coordinates when filtering data and to display fires/alerts. This was perhaps the hardest part to get right.

We have also been quite deliberate with the choice of color palette, trying to have an aquatic and tropical vibe to match the island. Certain visualizations also use color to indicate temperature. The Twitter cards were made in Photoshop and then imported using the html option in Streamlit in order to have the tweets appear ontop of the cards.

We hope you like the end-result!

Fire alert system algorithm

We spent some time exploring the data to find a relationship between tweets and fire occurrences. As a result of that analysis, we built a simple model that looks at words of tweets and returns a fire alert if a tweet contains one of the following: "forest fire", "stay safe" or "firefighter". These were always associated with fires at the location of the tweet. On days and locations where there were no tweets containing those terms, we look at the temperature data. If the temperature is at least 110 degrees, we do a binary classification using a LightGBM classifier. Again, we did not observe any fires when it was less than 110°F. On top of tuning the hyperparameters of the model for an unbalanced dataset, some feature engineering was also beneficial. For example, we noticed that certain locations and regions would consistently have high temperature but with few fires. Thus, we included the historical proportion of times a location was above 110 degrees and there being a fire as a feature. We tested several models such as naive Bayes and logistic regression, but LightGBM preformed best.

Streamlit link

https://save-zakar-hackathon-fyzo4s244liwqsd43pni2q.streamlit.app/

Source code link

https://github.com/andrejakobsen/save-zakar-hackathon/tree/main

Instructions for the reviewers

Note that some change had to be made to the environment files to make it compatible with streamlit. However, it should be straightforward to get all dependencies using env.yml with conda.

We ran into a lot of difficulties with deploying the app to streamlit and making sure the dependencies were installed. This unfortunately took away a lot of time from being able to finalize the fire alert producer. However, it is possible to make predictions using the fire_prediction function in predictor.py. One can send a dictionary of records from the temperature readings and tweets stations and receive a dictionary of fire alerts back. We visualize the predictions of our model in the "Fire Alert System" page in the app.

We use a .env file to store credentials that are then retrieved in a MemphisCredentails() dataclass. Simply replace them with yours.

What made you decide to build this particular app?

We wanted to see how far we could take it with Streamlit. We often find the end-result of a lot of apps not living up to the potential given the data they had to work with. We also thought it would be neat to combine our efforts by having the predictions from the model be visible on an actual map.

Other information

I deeply apologize for not having enough time to clean up the codebase, so please reach out if there is anything missing, unclear or that I forgot to mention. Thanks for a great hackathon! 🚀

Confirm submission of private form

andrejakobsen commented 1 year ago

@rnowling-memphis Just confirming that this is "Dre_Jay" from Discord.

Avitaltrifsik commented 1 year ago

Hey @andrejakobsen is the second team member on Discord as well? If so what are their nickname?