peacfuljoh / predictive-analytics-ytvideos

Full-stack real-time predictive anaytics for YouTube content creators
0 stars 0 forks source link

Resolve bugs in stats crawler #6

Closed peacfuljoh closed 1 year ago

peacfuljoh commented 1 year ago

Error: apply_regex(s, regex) call on multi-part regex returns empty

Affected URLS:

2023-09-14 13:18:58 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.youtube.com/watch?v=dcS28tYwmLc> (referer: None) Traceback (most recent call last): File "/home/nuc/miniconda3/envs/crawler/lib/python3.11/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks current.result = callback( # type: ignore[misc] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nuc/miniconda3/envs/crawler/lib/python3.11/site-packages/scrapy/spiders/init.py", line 73, in _parse return self.parse(response, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nuc/crawler/src/crawler/crawler/spiders/video_stats.py", line 115, in parse vid_info = extract_video_info_from_body(response, fmt='sql') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nuc/crawler/src/crawler/crawler/spiders/video_stats.py", line 66, in extract_video_info_from_body apply_regex(s, regex)[0]


IndexError: list index out of range
peacfuljoh commented 1 year ago

Switched to loading entire VideoDetails str-formatted dict and calling json.loads() to automatically parse all fields.