air-quality-tests

Working with air quality datasets from many sources. Tools: Excel, .NET 8, Visual Studio

About the dataset

Sources

Official

External providers

Índice de Calidad del Aire (Buenos Aires): Contaminación del Aire en Tiempo Real (aqi.in)

Noticias

Incendios y calidad del aire: Un vínculo inquietante en la Ciudad de Buenos Aires | Sobre La Tierra (uba.ar)
Especialistas de la UBA alertan por la calidad del aire en la Ciudad (ambito.com)

Data characteristics

Ficha de contaminante - CO. Pasted image 20241016025406.png

Ficha de contaminante - NO2. Pasted image 20241016025633.png

Ficha de contaminante - PM10. Pasted image 20241016025707.png

Fuentes recolectoras de datos. Pasted image 20241016025529.png

Transforming data

Calculating AQI (Air Quality Index) Tutorial (kaggle.com)
Add missing fields: season y "color" foreach contaminant (based on previously defined effects).

Analysis approach

Combination	Approach	Use Case	Benefits
Clustering + Decision Trees	Cluster by pollutant levels across locations, apply decision trees to predict air quality.	Group air quality patterns by time/location and predict poor air quality.	Reveals patterns and creates predictive rules for specific clusters.
Decision Trees + Association Rules	Use decision trees to identify key pollutants, apply association rules to find pollutant co-occurrences.	Identify key pollutants driving poor air quality, and find co-occurrence patterns.	Uncovers important pollutants and frequent co-occurrences.
Clustering + Association Rules	Cluster based on pollutant levels at different locations, then apply association rules within each cluster.	Understand pollutant combinations that spike together across locations.	Targeted pattern discovery for specific clusters of air quality data.
Clustering + Decision Trees + Association Rules	Cluster data, use decision trees to predict air quality, and mine association rules within clusters.	Comprehensive air quality analysis across time and locations.	Multi-faceted approach combining segmentation, prediction, and patterns.
Association Rules Informed by Decision Trees	Use decision trees to find key pollutants, then apply association rules on these pollutants.	Focus on critical pollutants and discover related pollutant patterns.	Reduces complexity by focusing on the most important features.

Architecture Records

1. On DB Injection

Context

Based on this file for a current sample: aspire-samples/samples/AspireShop/AspireShop.BasketService/AspireShop.BasketService.csproj at main · dotnet/aspire-samples (github.com)

Decision

I will inject the DB from the AirQualityApp.ApiService project

2. Processing Application

Context

Based on experience, the main objective of the app will be exposing through an API/web frontend, the consumed/processed dataset

Decision

The architecture will be like:

3. CSV Reading approach

Context

My options are:

Manual parsing.
JoshClose/CsvHelper NuGet Package.

Decision

We are using this NuGet package: JoshClose/CsvHelper: Library to help reading and writing CSV files (github.com) CsvHelper Getting started

unlanza / air-quality-tests

readme

air-quality-tests

About the dataset

Sources

Data characteristics

Transforming data

Analysis approach

Architecture Records

1. On DB Injection

Context

Decision

2. Processing Application

Context

Decision

3. CSV Reading approach

Context

Decision