gvwilson / sql-tutorial

The Querynomicon: An Introduction to SQL for Wary Data Scientists
https://gvwilson.github.io/sql-tutorial/
Other
447 stars 47 forks source link

generated series too short #37

Closed 2x2xplz closed 7 months ago

2x2xplz commented 7 months ago

query 60 contains a section:

        select value from generate_series(
            (select 0),
            (select count(*) - 1 from experiment)
        )

which generates a series 0-49 (because there are 50 experiments).

But the goal of this query is to cover every day, without gaps, from the first experiment to the last, including all the days with zero experiments. You can see the problem if you remove the limit 5 at the end of the query - the last day is 2023-03-19, but there are experiments all the way into 2024.

The line (select count(*) - 1 from experiment) in the segment above should be replaced with ( select julianday(max(started)) - julianday(min(started)) from experiment ). This will result in a query returning 371 rows, one for every day between 2023-01-29 and 2024-02-03, and which correctly sums to 50 in num_exp column.

You may also want to point out that julianday() is SQLite-only, other databases such as PostgreSQL or SQL Server use different functions to convert strings to dates. Further, also most databases do support simple minus (-) to get the difference between dates, explicit functions like DATEDIFF() tend to be less error-prone.

gvwilson commented 7 months ago

Thanks Andy - much appreciated.