js-fitz / Facebook-scraper

Collects post data from public a Facebook Group page, no account required
https://medium.com/@3joemail/job-hunting-without-social-media-152ada0639db
9 stars 0 forks source link

Unable to locate element #1

Open pvita opened 4 years ago

pvita commented 4 years ago

Hi there, I'm triyng to adopt your script but in the "scraping post data" phase i get this errors. Thank you for your work

Traceback (most recent call last): File "C:\Users\x\Desktop\gru.py", line 99, in data = load_parse_save() # trigger scrape function File "C:\Users\x\Desktop\gru.py", line 47, in load_parse_save a_data['title'] = link_box.find_element_by_class_name('_52jh').text File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webelement.py", line 398, in find_element_by_class_name return self.find_element(by=By.CLASS_NAME, value=name) File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webelement.py", line 658, in find_element return self._execute(Command.FIND_CHILD_ELEMENT, File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute return self._parent.execute(command, params) File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"._52jh"} (Session info: chrome=83.0.4103.97)

js-fitz commented 4 years ago

Hi pvita—

You're probably trying to scrape posts that have an external link but no link title. I suggest you wrap line #47 in a try / except pass statement to solve the issue like so:

43 > try: a_data['title'] = link_box.find_element_by_class_name('_52jh').text
44 > except: pass

What is the url of your target page? It's possible Facebook uses different page setups for different groups, which would be a bigger limitation to my algo

pvita commented 4 years ago

thank you Jz-fitz, i'm doing a test on this link I will let you know if your suggestion will work when it finish the process :)

Best

pvita commented 4 years ago

Bad news, same error :(

pvita commented 4 years ago

I've done another test on another group and there is a different error

additional content loaded: 950 total posts

scraping post data... data scraped: 0 posts quitting driver. parsing... Traceback (most recent call last): File "C:\Users\x\Desktop\gru.py", line 99, in data = load_parse_save() # trigger scrape function File "C:\Users\x\Desktop\gru.py", line 69, in load_parse_save data['post_text'] = data.post_text.str.replace('JTM', '').apply( File "C:\Users\x\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py", line 5274, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'post_text'

js-fitz commented 4 years ago

This error: 'DataFrame' object has no attribute 'post_text' is a result of the fact that the scraper picked up 0 posts (data scraped: 0 posts) — please note that my script is specifically designed to scrape only posts that include some kind of external link (example). If a post contains no external link, it will be skipped by the scraper. You can change this by editing the section that checks for a link_box, AKA the gray box Facebook generates for link thumbnails.

I recommend copying the code from my function into an interactive python environment such as Jupyter so you can test the process step-by-step to adapt it to your needs. If you start running the code line-by-line in an interactive environment, let me know where you get caught up and I can offer more help from there. Also see my post on Medium for a clear breakdown of the script and a glossary of attribute names and their corresponding element types.

Keep me updated on your progress and I will be happy to help along the way. Good luck!

js-fitz commented 4 years ago

Looking at both your links, it appears Facebook uses the same attribute glossary for pages in Italian, so you should be able to use the glossary I developed in my Medium post to suit your needs!