katiehuangx / 8-Week-SQL-Challenge

Case study solutions for the #8WeekSQLChallenge.
https://8weeksqlchallenge.com
226 stars 161 forks source link

Data Mart : Part C before and after analysis #14

Open praveen555 opened 1 year ago

praveen555 commented 1 year ago

Using the week_number to filter out the dates would result in incorrect analysis. A simple query will show that week_number shift in 2020.

image

Hence it is better to use date_add, date_sub and interval to filter out the correct dates.

Lesego-Serobatse commented 1 year ago

Well that is quite thoughtful of you to do that. And I'm a little bit worried about your part C analysis, I have noticed you included the date 2020-06-15, and the question really asks for 4 weeks before and after. Now the 'after' part cannot include the date 2020-06-15 reported sales, because those are accumulated(or aggregated rather) sales from Tuesday week before(2020-06-09). So you ought to count the weeks from the (2020-06-22) and exclude the 2020-06-15 date. Otherwise it will mess up the analysis. I just peeked at your code praveen555, I think Katie made that error as well, but I think Katies error was by default mainly because of the cleaning part which I raised.

praveen555 commented 1 year ago

As per the case study

"We would include all week_date values for 2020-06-15 as the start of the period after the change and the previous week_date values would be before"

note the word- include what i understood..

  1. week_date != week_number
  2. week_date values for 2020-06-15 should be included in after analysis (as per question) and the days (2020-06-14 and before should be for before analysis.
  3. My logic is not based on week_number but the actual dates +/- 4 weeks from 2020-06-15
Lesego-Serobatse commented 1 year ago

Oh yeah, thank you very much, I am actually the one that missed that part. Thanks Praveen, I appreciate your input in all of this.

praveen555 commented 1 year ago

yes now that i think about it more in depth date_add('2020-06-15', interval 4 week) should be used. I used 3 weeks influenced by it. Because all dates between 2020-06-15 and 2020-07-12 should be included for after analysis. That would give the correct results as per questions.

Lesego-Serobatse commented 1 year ago

Nah Praveen, I think you were right when you said we ought to include the week date values for 2020-06-15, because of this statement from Danny: 'Taking the week_date value of 2020-06-15 as the baseline week where the Data Mart sustainable packaging changes came into effect.' That means the operational changes really set in at the beginning of that week, of which the exact date is 2020-06-09, now if that is so, then we'll have to include the baseline week and that means
DATE_ADD('2020-06-15', interval 3 week). You were correct. It was a bit confusing for me though, but thanks again for lifting the veil.