mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

slight differences in my result and the expected output files on pagil3 problem 14 #468

Closed epaisano closed 8 months ago

epaisano commented 8 months ago

Hello!

I keep reviewing my code for problem 14 to figure out why I get three different names from the expected output, and I can't figure out where my problem is. The following is my current code:

select name, film_title as title, total_rentals as "total rentals" from (
    select
        c.name,
        f.film_id,
        f.title as film_title,
        count(r.rental_id) AS total_rentals,
        row_number() over (partition by c.category_id order by count(r.rental_id) desc) as rn
    from
        category c
    join
        film_category fc on c.category_id = fc.category_id
    join
        film f on fc.film_id = f.film_id
    join
        inventory i on f.film_id = i.film_id
    join
        rental r on i.inventory_id = r.inventory_id
    group by
        c.category_id, f.film_id, f.title
) as q
where
    rn <= 5
order by name, "total rentals" desc, title;

and the following is what I get when I run the diff command:

--- expected/14.out 
+++ results/14.out  
@@ -43,8 +43,8 @@
  Foreign     | GOODFELLAS SALUTE      |            31
  Foreign     | NETWORK PEAK           |            31
  Foreign     | WIFE TURN              |            31
+ Foreign     | BINGO TALENTED         |            29
  Foreign     | STORM HAPPINESS        |            29
- Foreign     | SWEETHEARTS SUSPECTS   |            29
  Games       | BUCKET BROTHERHOOD     |            34
  Games       | ROBBERS JOON           |            31
  Games       | DOGMA FAMILY           |            30
@@ -74,11 +74,11 @@
  Sports      | GRAFFITI LOVE          |            30
  Sports      | MUSCLE BRIGHT          |            30
  Sports      | CONFIDENTIAL INTERVIEW |            29
- Sports      | TRIP NEWTON            |            28
+ Sports      | CLOSER BANG            |            28
  Travel      | WIFE TURN              |            31
  Travel      | DOGMA FAMILY           |            30
  Travel      | MARRIED GO             |            30
- Travel      | TALENTED HOMICIDE      |            29
+ Travel      | DEER VIRGINIAN         |            29
  Travel      | TITANS JERK            |            29
 (80 rows)

DId anyone else get thie same problem by chance? Thank you!!

mikeizbicki commented 8 months ago

Consider just the best sellers in the "travelers" category:

  Travel      | WIFE TURN              |            31
  Travel      | DOGMA FAMILY           |            30
  Travel      | MARRIED GO             |            30
- Travel      | TALENTED HOMICIDE      |            29
+ Travel      | DEER VIRGINIAN         |            29
  Travel      | TITANS JERK            |            29

Notice that there are more than 5 rows with "total rentals" >= 29. So somehow we need to "break the tie" and select only 5 rows. There are many reasonable ways to break the tie, but to solve this problem you to figure out how I broke the tie and implement that. (I used a ORDER BY clause, but you'd need to figure out the particular columns/directions that it's ordered on.) It's particularly tricky because I used a different ORDER BY clause for breaking the tie than I did for ordering the final results.

epaisano commented 8 months ago

I see, thank you!