ozkary / ai-engineering

Working use cases with AI written on Python and TypeScript
https://www.ozkary.com/2023/05/ai-engineering-generate-code-from-user-stories.html
Apache License 2.0
5 stars 0 forks source link

Create a SQL Query to Get the Top Ten Visited Stations #4

Open ozkary opened 4 months ago

ozkary commented 4 months ago

As a database developer, I need to create a complex SQL query that utilizes CTEs (Common Table Expressions) and joins to extract and analyze data efficiently. This query should identify the top 10 most visited stations for each month of the year 2024. So that, I can gain insights into station usage patterns and optimize resource allocation across the network.

Data Model:

Requirements:

Desired Outcome:

The generated SQL code should effectively leverage CTEs and joins to achieve the stated goals. The output should be a clear table displaying the month, station name, total visitors, and rank for the top 10 most visited stations in each month of 2024.

ozkary commented 4 months ago

Provider.GEMINI Generated code:

/* Common Table Expression (CTE) to calculate monthly station visitor counts */
WITH MonthlyStationCounts AS (
    SELECT
        STRFTIME('%Y-%m', created_dt) AS month, /* Extract year-month from created_dt */
        ds.station_name,
        SUM(fc.entries + fc.exits) AS total_visitors
    FROM
        fact_commuter fc
    JOIN
        dim_station ds ON fc.station_id = ds.station_id
    WHERE
        STRFTIME('%Y', created_dt) = '2024' /* Filter for year 2024 */
    GROUP BY
        month, ds.station_name
),
/* CTE to rank stations within each month based on total visitors */
RankedStations AS (
    SELECT
        month,
        station_name,
        total_visitors,
        RANK() OVER (PARTITION BY month ORDER BY total_visitors DESC) AS rank_within_month
    FROM
        MonthlyStationCounts
)
/* Retrieve the top 10 most visited stations for each month */
SELECT
    month,
    station_name,
    total_visitors,
    rank_within_month
FROM
    RankedStations
WHERE
    rank_within_month <= 10
ORDER BY
    month, rank_within_month;
ozkary commented 4 months ago

Provider.GEMINI Generated code:

WITH MonthlyStationActivity AS (
    -- CTE 1: Calculate monthly station activity
    SELECT
        SUBSTR(created_dt, 1, 7) AS month,
        s.station_name,
        SUM(c.entries + c.exits) AS total_visitors
    FROM
        fact_commuter c
    JOIN
        dim_station s ON c.station_id = s.station_id
    WHERE
        created_dt BETWEEN '2024-01-01' AND '2024-12-31'
    GROUP BY
        1, 2
),
RankedStations AS (
    -- CTE 2: Rank stations within each month
    SELECT
        month,
        station_name,
        total_visitors,
        RANK() OVER (PARTITION BY month ORDER BY total_visitors DESC) AS rank
    FROM
        MonthlyStationActivity
)
-- Final query: Retrieve top 10 stations for each month
SELECT
    month,
    station_name,
    total_visitors,
    rank
FROM
    RankedStations
WHERE
    rank <= 10
ORDER BY
    month,
    total_visitors DESC;
ozkary commented 1 month ago

Provider.GEMINI Generated code:

WITH MonthlyCommuterActivity AS (
    SELECT
        STRFTIME('%Y-%m', created_dt) AS visit_month,
        station_id,
        SUM(entries + exits) AS total_visitors
    FROM fact_commuter
    WHERE STRFTIME('%Y', created_dt) = '2024'
    GROUP BY 1, 2
), RankedStations AS (
    SELECT
        visit_month,
        station_name,
        total_visitors,
        RANK() OVER (PARTITION BY visit_month ORDER BY total_visitors DESC) AS rank
    FROM MonthlyCommuterActivity
    JOIN dim_station ON MonthlyCommuterActivity.station_id = dim_station.station_id
)
SELECT
    visit_month,
    station_name,
    total_visitors,
    rank
FROM RankedStations
WHERE rank <= 10
ORDER BY visit_month, rank;