j2kun / imsdb_download_all_scripts

Download all plaintext scripts from imsdb.com
31 stars 14 forks source link

IndexError: list index out of range #1

Open jacobkreider opened 5 years ago

jacobkreider commented 5 years ago

Traceback after downloading 'O Brother Where Art Thous? Script.html':

Traceback (most recent call last): File "download_all_scripts.py", line 59, in title, script = get_script(relative_link) File "download_all_scripts.py", line 42, in get_script script_text = script_soup.find_all('td', {'class': "scrtext"})[0].get_text() IndexError: list index out of range

Not sure what's making it fail

fatfishZhao commented 5 years ago

Maybe there on some inconsistent form in that movie content. I just skip that movie.

nayanchavan commented 4 years ago

Did anyone get around this error? I am having the same one @jacobkreider

Dnouvel commented 4 years ago

Well, I think the problem is in the ? character in the title, I am not sure how to solve this (I am a Python novice), but I found a way around it. I transformed the paragraphs 'href' into a list and then continued the iteration starting from the list member after this film [823] .. you can download the missing script manually

stats=[]
for p in paragraphs:stats.append(p.a['href'])
for p in stats[823:]:
        relative_link = p ##continue the code from here as given
Dnouvel commented 4 years ago

The same thing happens with: What About Bob? and Who Framed Roger Rabbit?

fatfishZhao commented 4 years ago

I just used "try... except..." to skip the scripts with error. Only a very few scripts got skipped.

nayanchavan commented 4 years ago

Yeah, it has a problem with question marks since it is %3f in the URL. If you look at my fork, I manually skipped over the 3 movies that have a question mark in them and will download those three manually.