Not scraping some pages

wael-sudo2 / facebook-page-info-scraper

Free Facebook pages MetaData Scraping Library - Unlimited Calls

MIT License

18 stars 3 forks source link

Not scraping some pages #29

Closed elcoloo closed 3 months ago

elcoloo commented 3 months ago

Hello,

I have 2 pages with same layout (x1yztbdb), one is scrapped correctly and the other one is not.

2024-04-17 17:01:33 - https://www.facebook.com/janofalltradesproductions
2024-04-17 17:01:36 - Layout: x1yztbdb
scraping: Linda Grasso - Writer, Actress, Producer
{'page_name': 'Linda Grasso - Writer, Actress, Producer', 'page_category': 'Productor', 'email': 'janofalltradesproductions@gmail.com', 'page_website': 'http://www.lindagrasso.com', 'social_media_links': None, 'phone_number': None, 'location': None, 'page_rate': None, 'page_review_number': None, 'page_likes': None, 'page_followers': '72 ', 'following': '3 in a row'}

2024-04-17 17:04:00 - https://www.facebook.com/lisaskydlaenglish
2024-04-17 17:04:01 - Layout : x9orja2
scraping: Facebook
{'page_name': 'Facebook', 'location': None, 'email': None, 'phone_number': None, 'social_media_links': None, 'page_website': None, 'page_category': None, 'page_likes': None, 'page_followers': None}

any ideas why?

wael-sudo2 commented 3 months ago

the second link https://www.facebook.com/lisaskydlaenglish you trying to visit isn't a public page

elcoloo commented 3 months ago

oh so the script won't scrape non public urls? I'm using an installed cookie to get rid of blocks so I can see the page/email, is it any way you can tweak the code to get those pages scrapped?

wael-sudo2 commented 3 months ago

use incognito mode to check the link it's not accesible only with a login so it's not public

elcoloo commented 3 months ago

I understand what you mean but I'm using your script with a cookie installed in selenium, so even if the page is not public, I can see the page details (email, phone, web, etc).

my question is: can you tweak the code so I can scrape those non-public links while being logged in using the cookie?

wael-sudo2 commented 3 months ago

by the start of the next week i will update a new version with the posiblitie of scraping none public pages with a user login

elcoloo commented 3 months ago

I'm not a coder but with gpt I've created a "robust" fb scraper that uses your code, I have the files on my pc, perhaps we could have a quick call and I show you what I'm doing, I don't have much knowledge but what I achieved so far:

the scraper gets a big list of urls from a txt file
it generates a json file with 'pending', 'ok', and 'not_ok' urls
it connects to facebook using a cookie stored in another .json file
browses the fb pages to scrap the details (I'm interested in email accounts) and exports to excel file
runs until there are no more urls to be parsed

I have also configured some delays between the requests to avoid being blocked and I have a config file where I put my proxy settings so I can different scrapers with different proxies. currently running 8 instances at same time.

I'd love to share my work with you as the core of everything is your 'facebook-page-info-scraper' script and it's helping me a lot but, again, I'm not a coder so a chat/call to share what I've achieved and what can be improved from there

whatsapp+5491151166685 (it might seem weird but send me a message) cheers!