openzim / ted

Provide the best of TED.com for offline usage!
https://download.kiwix.org/zim/ted/
GNU General Public License v3.0
13 stars 9 forks source link

IndexError: list index out of range #134

Closed kelson42 closed 2 years ago

kelson42 commented 2 years ago

[ted2zim::2022-08-02 12:56:54,029] DEBUG:Using h264 resource link for bitrate=1200 [ted2zim::2022-08-02 12:56:54,031] DEBUG:Successfully inserted video 41224 into video list [ted2zim::2022-08-02 12:56:54,032] DEBUG:Seen /talks/doug_roble_digital_humans_that_look_just_like_us?language=en [ted2zim::2022-08-02 12:56:54,032] DEBUG:extract_info_from_video_page: https://ted.com/talks/tom_thum_and_matthew_broadhurst_what_happens_in_your_throat_when_you_beatbox?language=en [ted2zim::2022-08-02 12:56:56,166] DEBUG:Using h264 resource link for bitrate=1200 [ted2zim::2022-08-02 12:56:56,168] DEBUG:Successfully inserted video 32071 into video list [ted2zim::2022-08-02 12:56:56,169] DEBUG:Seen /talks/tom_thum_and_matthew_broadhurst_what_happens_in_your_throat_when_you_beatbox?language=en [ted2zim::2022-08-02 12:56:56,169] DEBUG:extract_info_from_video_page: https://ted.com/talks/maisie_williams_why_talent_carries_you_further_than_fame?language=en [ted2zim::2022-08-02 12:56:58,207] ERROR:FAILED. An error occurred: list index out of range [ted2zim::2022-08-02 12:56:58,207] ERROR:list index out of range Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/ted2zim-2.0.10-py3.8.egg/ted2zim/entrypoint.py", line 190, in main scraper.run() File "/usr/local/lib/python3.8/site-packages/ted2zim-2.0.10-py3.8.egg/ted2zim/scraper.py", line 1044, in run if not self.extract_videos_from_topics(topic): File "/usr/local/lib/python3.8/site-packages/ted2zim-2.0.10-py3.8.egg/ted2zim/scraper.py", line 281, in extract_videos_from_topics total_videos_scraped = self.generate_search_result_and_scrape( File "/usr/local/lib/python3.8/site-packages/ted2zim-2.0.10-py3.8.egg/ted2zim/scraper.py", line 262, in generate_search_result_and_scrape nb_videos_extracted, nb_videos_on_page = self.extract_videos_on_topic_page( File "/usr/local/lib/python3.8/site-packages/ted2zim-2.0.10-py3.8.egg/ted2zim/scraper.py", line 419, in extract_videos_on_topic_page if self.extract_info_from_video_page(url): File "/usr/local/lib/python3.8/site-packages/ted2zim-2.0.10-py3.8.egg/ted2zim/scraper.py", line 636, in extract_info_from_video_page return self.extract_video_info_from_json(json_data) File "/usr/local/lib/python3.8/site-packages/ted2zim-2.0.10-py3.8.egg/ted2zim/scraper.py", line 549, in extract_video_info_from_json speaker_info = json_data["speakers"]["nodes"][0] IndexError: list index out of range

From https://farm.openzim.org/pipeline/cf2700dba08d31af2be19e26/debug and https://farm.openzim.org/pipeline/0c6e00d79a3ed8752ee19e26/debug and https://farm.openzim.org/pipeline/243700dba08d31af92f19e26/debug

rgaudin commented 2 years ago

Let's try one on :dev ; there might be additional changes that my tests on a small playlist didn't catch.