Open ckabalan opened 6 years ago
I think Kevin's application was using web scraping to get the data because AFAIK it stopped working at the same time as epdate/eprate pages were removed. Kevin also tweeted 4 days ago that graphtv will be back "hopefully soon".
Have you tried contacting IMDb to get a definite answer?
In the meantime I'll add an acknowledgement that the data is from IMDb and also add a contact link to the footer of this app so they can contact me if they want. If they contact me and ask me to remove the data I'll switch to using some other API like trakt.
@utkarshkukreti, I contacted the IMDb Helpdesk privately with the following:
To: IMDb Helpdesk From: Caesar Kabalan / Dandelock Subject: IMDb Ratings Data Fair Use Body: I would like to write a small website which uses the publicly available IMDb dataset (http://www.imdb.com/interfaces/) to show ratings data for TV Shows in graph form. You type in a TV Show name and it shows a chart with the ratings for each episode which gives you an idea on how well received the show was as it progressed.
This website would be 100% non-commercial and would be cheap enough to host that it would not require ads or any monetization. It would however be displaying data from the public IMDb dataset. All data would be cleared marked as sourced from IMDb and kept up to date.
Does IMDb have a stance on whether we can use a subset of the data this way?
Any response would be helpful!
I've posed the question publicly if you're looking for additional details: https://getsatisfaction.com/imdb/topics/imdb-ratings-data-fair-use
They responded to the forum post linked at the bottom of the email and then privately to my helpdesk ticket a few hours later:
To: Caesar Kabalan / Dandelock From: IMDb Helpdesk Subject: Re: IMDb Ratings Data Fair Use Body: Hi Caesar,
No, your usage wouldn't qualify for our intended usage of our free dataset. That dataset has very limited allowed uses -- namely, private & personal use (meaning, no one else sees our data in your work except you) or in-the-classroom academic work (gor example, a paper or thesis for a class).
The fact that you won't monetize your website doesn't mean that the usage isn't commercial. You would require our commercial content license for your use case.
Our license product is aimed at large companies. Our licensees include The New York Times, Viacom, United Airlines, and Verizon among many others. As such, there is a license fee that starts at five figures. We assume that this is well beyond your means for your project, but if it's not, please let us know and we'll put you in touch with the right licensing people here.
Regards, The IMDb Help Desk
I marked my question as resolved. Unfortunately it looks like these GraphTV type websites aren't possible using IMDb's data directly.
Thanks for asking them. That's really unfortunate. I'm going to shut down the site and redirect it to this page until I find a different source for the ratings.
Sorry to see this, I was really enjoying this site with GraphTV gone. I've dealt with ratings sites and had similar results. I really wish there wasn't so much gray area around this. From my experience and research, it seems like you can display publicly accessible data if you scrape it (which is simple enough) but who wants to take that risk?
@jessejoe Actually, if you look at their site Conditions of Use...
Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.
And then...
Licensing IMDb Content; Consent to Use Robots and Crawlers: If you are interested in receiving our express written permission to use IMDb content for your non-personal (including commercial) use, please visit our Content Licensing section or contact our Licensing Department. We do allow the limited use of robots and crawlers, such as those from certain search engines, with our express written consent. If you are interested in receiving our express written permission to use robots or crawlers on our site, please contact our Licensing Department.
Ironically the Content Licensing section / Licensing Department are the ones who answered the emails above... So the answer is you cannot do it and everything be "on the up-and-up".
@ckabalan it is not uncommon for companies to put unenforceable or even flat out wrong restrictions in their ToS. There is plenty of research and anecdotal (or even actual legal) cases out there supporting that publicly available information is generally safe to gather and/or scrape. The problem is there doesn't seem to be a clear precedent or legal foundation really set for it, so you wind up trying to wedge internet/computer activity into other laws. And again, it's just not worth the risk.
You could come up with your own ratings and put the site back up using those. Maybe your ratings just happen to be very similar to what the IMDB ratings are...
As @MatthewPDingle suggests, The point is to use a different rating source. e.g. It seems TVtime (https://www.tvtime.com/fr) uses it's personal rating information. Maybe It would be easier to find an arrangement with them ?
Can't http://www.omdbapi.com/ be used for this?
@tweakign I actually contacted the owner of omdbapi for this last week. I haven't heard back from him yet.
@utkarshkukreti Any news? It would seem to be pretty ideal.
@tweakign I heard back from Brian a couple of days ago. He clarified that the "imdbRating" and "imdbVotes" fields returned from omdbapi are actually not IMDb's data but data collected from other free sources. They do not provide any data sourced from IMDb anymore.
+1 for trying with another rating source...
Just as a note, you can still get the old IMDb data from this ftp server [1] - that data is allowed to be used for non-commercial use (I got explicit permission to use it for my tv-show-ratings site in 2016). It's not updated as of Nov 2017 though.
For the newer data, I avoid the license by not hosting any data myself - instead I just parsed the data, hashed it, and put that hash into my website - the website then tries to fetch the data on page load in a dezentral manner using that hash. As long as a single person has the website open it will load :)
It's not ideal, but it mostly works fine - see https://phiresky.github.io/tv-show-ratings/
1: ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata
I've built something similar using OMDbapi.com. Seems to work pretty well at the moment, however only if the API will be kept online with up to date data.
https://github.com/CodeBrauer/IMDb-Rating-Chart http://imdb.codebrauer.com/
I posted an update to my project here:
https://github.com/graphtv/website/issues/1#issuecomment-543387441
I contacted IMDb again, no luck.
Hi!
I recently wrote a similar application after Kevinformatic's Graph TV went away, and after his ominous update to the webpage ("reasons outside my control", "as soon as I am able") I did some research and found that my app was actually in violation of the IMDb ToS, so I immediately disabled any data retrieval functionality. I had originally just downloaded the files from https://datasets.imdbws.com/ but I did more investigating and found their licensing states:
As I interpret that you can only use the data for like your own private exploration/curiosity, so you couldn't even publish ratings publicly because it would no longer by "individual personal use". I also wonder how specifically calling out "movie information" relates to TV shows, actors, etc. The strange thing is the very next item...
If it's only for your own "individual personal use" why would you need to attribute it? No one except yourself is going to see it. I don't understand #4 in combination with #3 unless there are scenarios where IMDb is OK with public display of data scraped from the data sets (IE, NOT personal use).
I talked to another individual familiar with IMDb data and they indicated that the Amazon legal team was actively reaching out to people using IMDb data publicly.
I only bring this up because it sounds to me like @kevinwuhoo may have been hit up by IMDb lawyers/DMCAed/etc and is reluctant to speak publicly. Feel free to close this issue if you feel it is irrelevant to the Git repo and open-source aspect of your project. You may also be outside the US and these laws are irrelevant. Just thought I would chime in.