katherinebenjamin / Homeworks

Homework for Data Science course
0 stars 0 forks source link

project review #4

Open sganzfri opened 8 years ago

sganzfri commented 8 years ago

@masongallo @lemonsoup

Hi Kate,

I looked at your slides Movie.DatabasePicture.pdf and code Final_Project.ipynb.

I liked the slides in the presentation. A lot of good graphs/visuals makes it easy to follow. It looks like you focused on the background and motivation for the problem. I assume you'll add in discussion of the modeling techniques and algorithms you are using, and results/conclusions for the final version.

Probably a dumb question: do movies teams decide when they want to release (e.g., in the summer)? Or does it depend on theater availability etc? Also, even though more movies open in the summer, are the summer moves actually making more money than the others?

For the final figure with avg ticket price over time, seemed a bit strange to present the X axis with years descending, instead of having years ascending to show the increased ticket price over time.

As we discussed in class yesterday, I agree that linear regression seems like the best model to use for this problem.

Do you have intuition for whether the 0.85/0.91 results you got with the model were "good"? That seems like very high accuracy. Is there a dummy/default score to compare with?

If you are considering trying an additional approach, perhaps the tree-based ones (decision tree/random forest) could be useful, though lin reg seems like the most natural and probably best.

-Sam

katherinebenjamin commented 8 years ago

Hi Sam,

Thank you for the great feedback. I'll be sure to cover your questions on movies and releases in my final presentation. Movie teams do decide when to release their movies (the reason why we see all the Oscar nominated movies coming out in the Fall at the same time - so they are fresh in the minds of the committee). I believe summer hits have the potential to make more money than others, but I will look into that more closely.

I agree that the avg ticket price over time is strange descending - unfortunately that was how the data was laid out so I am trying to figure out an easy way to fix it without redoing it all.

Thank you again for the feedback!