Closed pequalsmp closed 6 years ago
The value None
is a valid value for the key and as such the default is never used in the following:
last_sync = record.get(list_id, 'Sat, 01 Jan 2000 00:00:00 GMT')
last_sync = datetime.strptime(last_sync, self.date_format)
Not so sure how the value was set to to null
in the database in the first place, but manually updating the value worked fine.
Not sure if its a valid issue, feel free to re-open if necessary.
It appears that parse_build_date
fails to parse the lastBuildTime
sometimes (malformed XML?).
Is there a reason why the response is used as time source? Isn't simple to just use the computer time as a source for the last sync time?
Experiencing the same here, for what it’s worth. Pasting from the onscreen log if you’re wondering why it looks backwards...:
TypeError: must be str, not None
last_sync = datetime.strptime(last_sync, self.date_format)
File "/opt/Watcher3/core/rss/imdb.py", line 53, in get_rss
self.task()
File "/opt/Watcher3/core/cp_plugins/taskscheduler.py", line 254, in _task
Traceback (most recent call last):
WARNING 2017-12-02 19:47:19,381 CPTaskScheduler._task: Scheduled Task IMDB Sync Failed:
Should be fixed in 80b56ede1d7e895f160de36fff8086f07879b759. A slight misunderstanding on my part about exactly how {}.get() works.
Still experiencing errors related to IMDB... log posted from screen so reverse order.
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 29, column 1170
File "<string>", line None
parser.feed(text)
File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 1325, in XML
root = ET.fromstring(feed)
File "/opt/Watcher3/core/rss/imdb.py", line 109, in parse_build_date
record[list_id] = self.parse_build_date(response)
File "/opt/Watcher3/core/rss/imdb.py", line 55, in get_rss
self.task()
File "/opt/Watcher3/core/cp_plugins/taskscheduler.py", line 254, in _task
Traceback (most recent call last):
WARNING [2017-12-17 18:30:14,904] CPTaskScheduler._task.256: Scheduled Task IMDB Sync Failed:
IMDB disabled rss lists. I don't know if they intend to bring them back or not.
Welp... to trakt we go, I guess. Any thought on allowing other lists besides defaults?
Several people have asked about it and my answer is always the same. I don't have a Trakt account and I'm not going to pay for one. To add that functionality I need a copy of an rss feed so I can know how to parse it. If anyone with a Trakt account sends me a copy of their rss feed (the actual rss contents, not the url) I can add it relatively quickly.
watchernzb@gmail.com
The issue is that, while using React, iMDB
is preloading the initial state, so you have to extract it from the HTML. Its doable with something like Scrapy
and Splash
but this would add new dependencies.
In the mean time, a workaround -- actually a hack, a really nasty hack, can be:
import re
import urllib.request
content = urllib.request.urlopen("http://www.imdb.com/user/urXXXXXXXX/watchlist").read()
initial_state= re.findall(r"IMDbReactInitialState\.push\((.*?)\);\\n", str(content))
# Get the IMDbReactInitialState, which contains an array with the user movies
for match in initial_state:
ids= re.finditer(r"tt\d{7}", match)
# Look for imdb ids in the initial state
for match in ids:
print(match.group())
This example will filter the ids from a user's Watchlist
, which can be used later on, in order to get more info. I'm not sure if iMDB
prevents crawling or how long this might work, but it seems iMDB
is hellbent on making sure you're using their paid API even for benign functionality likeWatchlist
.
Sample watchlist sent in email.
@enilfodne
I try to avoid scraping if at all possible. It is easy to break or cause all sorts of other weird problems. If IMDB decides they are killing rss forever I'll probably look into downloading the list csv and parsing that instead.
@barbequesauce
Got it, thanks!
Thank you for jumping on this! Looks great.
I'm closing this. As of today IMDB still has rss disabled. I may look at using the csv to sync in the future, but that is not a project for today.
After noticing that a couple of movies are missing from my library, i've checked the logs and found the following error, displayed every time there's an attempt to sync iMDB rss feed: