Open Drzhivago264 opened 1 year ago
I have exactly the same issue, can you please share the solution if you got it??
Also experiencing this.
You can add 2 (try, except pass) arguments at lines 866-948 and 1120-1126 in facebook_scrapper.py The corrupted content is passed, and you can only get the data without reactors and reactions from this corrupted content. I think Facebook changes something about reactors and reactions
Are you running the script on Windows?
It seems that Facebook manages to move some content into a redirect loop.
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
I tried to set redirect (in session package) = False, but facebook-scraper throws this error: lxml.etree.ParserError: Document is empty
How Can I catch this error to skip the corrupted content? You can test with Postid: 2397025053807303 group: UkrainianAvstralia
Update: Dumb fixes for this: line 866-948 ` try: if kwargs.get("post"): kwargs.pop("post") response = self.session.post(url=url, kwargs) else: response = self.session.get(url=url, self.requests_kwargs, **kwargs) DEBUG = False if DEBUG: for filename in os.listdir("."): if filename.endswith(".html") and filename.replace(".html", "") in url: logger.debug(f"Replacing {url} content with {filename}") with open(filename) as f: response.html.html = f.read() response.html.html = response.html.html.replace('', '') response.raise_for_status() self.check_locale(response)
line 1120-1126 facebook_scrapper.py
try: post = extract_post_fn(post_element, options=options, request_fn=self.get) if remove_source: post.pop('source', None) yield post except: pass