kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.28k stars 613 forks source link

Scraper returns incomplete information for get_page_info #673

Open suarezjessie opened 2 years ago

suarezjessie commented 2 years ago

The scraping works fine for some pages but for some, it retrieves less information such as the following:

This code snippet below (page: atebeyandsell)

from facebook_scraper import *
from pprint import pprint

set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
pprint(get_page_info('atebeyandsell'))

Returns the following

{'about': 'Entrepreneur · Gaming Video Creator\n'
          'Send message\n'
          'ᴅɪsᴄᴏᴜɴᴛᴇᴅ ɢᴀᴍᴇ ᴄʀᴇᴅɪᴛs sᴇʟʟᴇʀ.\n'
          'ᴛʀᴜsᴛᴇᴅ & ᴀʟᴡᴀʏs ʀᴇᴄᴏᴍᴍᴇɴᴅᴇᴅ.\n'
          'ᴀʟʟ ᴛʀᴀɴsᴀᴄᴛɪᴏɴs ᴀʀᴇ sᴀғᴇ ᴀɴᴅ ʟᴇɢɪᴛ ✨\n'
          '1 Video\n'
          'atebeyofficial@gmail.com\n'
          'http://instagram.com/atebeyofficial',
 'likes': 6613,
 'profile_photo': 'https://scontent.fmnl17-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_eui2=AeH1BAfJPhyOlrPCVM-i5RSMckwgHk9sgqFyTCAeT2yCoRsyBrnuYXXkf8OdF8DXgEHEC2SHH_Dx7Ks7cSHtfxxq&_nc_ohc=WOM3rj6xiC0AX8NlL45&_nc_ht=scontent.fmnl17-3.fna&oh=00_AT_TJvBiiaShuD4E74ffQ4HYWksfET86ScTYY9KPCvps7Q&oe=6232C265',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fcb2007b350>}

Meanwhile, this code snippet (page: panglaofooddelivery)

from facebook_scraper import *
from pprint import pprint

set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
pprint(get_page_info('panglaofooddelivery'))

Returns the following

{'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fcb300bfc10>}
neon-ninja commented 2 years ago

These two examples output the following for me:

Requesting page from: /atebeyandsell/about/
Requesting page from: /atebeyandsell/
Ate Bey and Sell. 6,614 likes · 400 talking about this. ᴅɪsᴄᴏᴜɴᴛᴇᴅ ɢᴀᴍᴇ ᴄʀᴇᴅɪᴛs sᴇʟʟᴇʀ.
ᴛʀᴜsᴛᴇᴅ & ᴀʟᴡᴀʏs ʀᴇᴄᴏᴍᴍᴇɴᴅᴇᴅ.
ᴀʟʟ ᴛʀᴀɴsᴀᴄᴛɪᴏɴs ᴀʀᴇ sᴀғᴇ ᴀɴᴅ ʟᴇɢɪᴛ ✨
{'about': 'About\n'
          'http://instagram.com/atebeyofficial\n'
          'Away\n'
          'Send message\n'
          'Entrepreneur · Gaming Video Creator\n'
          'See all',
 'address': None,
 'cover_photo': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_ohc=WOM3rj6xiC0AX_ZwTlF&_nc_ht=scontent.fakl8-1.fna&oh=00_AT9Z9wLOTeyFvxmbE_0JrDjekb_XmjYyX5IiKPLB5DG-UQ&oe=6232C265',
 'followers': 7961,
 'identifier': 107313684786461,
 'image': None,
 'likes': 6614,
 'name': 'Ate Bey and Sell',
 'profile_photo': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-6/260975528_262784969239331_4505857755717975013_n.png?stp=cp0_dst-png_p64x64&_nc_cat=105&ccb=1-5&_nc_sid=85a577&efg=eyJpIjoidCJ9&_nc_ohc=iOFpwUnlzGUAX-e0kKf&_nc_ht=scontent.fakl8-1.fna&oh=00_AT-GWPWJqWrfR10zH7YwLtGRGC_nWRfnPI9-AuqdyMMaMg&oe=62106AC8',
 'rating': 'Entrepreneur',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fb299dc7a50>,
 'sameAs': 'instagram.com/atebeyofficial',
 'type': 'Person',
 'url': 'https://www.facebook.com/atebeyandsell/'}
Requesting page from: /panglaofooddelivery/about/
Content Not Found
Requesting page from: /panglaofooddelivery/
Panglao FOOD Delivery, Panglao, Bohol. 211 likes · 22 talking about this. CLICK "Get Started" and "Order Now" button to Start Ordering
{'about': 'About\n'
          'Suggest edits\n'
          '6340 Panglao, Philippines\n'
          'Get Directions\n'
          'See Menu\n'
          'Rating · 5\n'
          '(3 reviews)\n'
          'ejaykylie2020@gmail.com\n'
          'See what Panglao FOOD Delivery is doing in Messenger\n'
          'Get Started\n'
          'Closed now\n'
          '·\n'
          '7:00 AM - 8:00 PM\n'
          'Closed now\n'
          '·\n'
          '7:00 AM - 8:00 PM\n'
          'Wednesday\n'
          'Thursday\n'
          'Friday\n'
          'Saturday\n'
          'Sunday\n'
          'Monday\n'
          'Tuesday\n'
          '7:00 AM - 8:00 PM\n'
          '6:30 AM - 8:00 PM\n'
          '10:30 AM - 8:00 PM\n'
          '6:00 AM - 8:00 PM\n'
          '6:30 AM - 8:00 PM\n'
          '7:00 AM - 8:00 PM\n'
          '7:00 AM - 8:00 PM\n'
          'CLICK "Get Started" and "Order Now" button to Start Ordering\n'
          'Offers free Wi-Fi\n'
          'Food delivery service\n'
          'See more\n'
          'See Less',
 'address': None,
 'followers': 219,
 'foundingDate': '2020-10-29T06:26:29-0700',
 'identifier': 107561757820720,
 'image': None,
 'likes': 211,
 'name': 'Panglao FOOD Delivery',
 'rating': '5.0 (3)',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fb299dee430>,
 'sameAs': '<<not-applicable>>',
 'type': 'Organization',
 'url': 'https://www.facebook.com/panglaofooddelivery/'}

There must be something wrong with your cookies. Perhaps you're facing temporary bans due to excessive scraping.

suarezjessie commented 2 years ago

Oh. Alright. Is there any way to circumvent this problem? Would multiple cookies do or would adding sleep time between profile scraping help?

neon-ninja commented 2 years ago

Probably, give it a try

suarezjessie commented 2 years ago

Is there a way to identify if the cookie is already banned or something? So I can also estimate around how many posts/profiles would reach that limit. Also, would you know how long until the temporary ban lasts?

neon-ninja commented 2 years ago

A key you need that is missing, should be a good smoke test. Usually around an hour or so.

suarezjessie commented 2 years ago

I tried using multiple cookies, whenever I use a different account's cookie, the previous account's cookie that I used becomes invalid. Is there a workaround for this?

neon-ninja commented 2 years ago

Clicking the "Log Out" button on Facebook invalidates those cookies. So if you're switching accounts by signing out of one account and signing into another, you're invaliding those cookies. A good workaround is to use incognito mode, and closing the browser to clear cookies without invalidating them.

aminrabinia commented 2 years ago

username for reviews in get_page_info() sometimes returns page title instead of real user's name. links = elem.find("a") "username": links[0].text,

{'user_url': 'https://facebook.com/onedaycincinnati/?locale2=en_US', 'username': 'Greater Cincinnati Doors And Closets', 'profile_picture': 'https://scontent.fmcc1- {'user_url': 'https://facebook.com/morgan.hoehn?locale2=en_US', 'username': 'Morgan Hoehn', 'profile_picture': 'https://scontent.fmcc1-1 {'user_url': 'https://facebook.com/onedaycincinnati/?locale2=en_US', 'username': 'Greater Cincinnati Doors And Closets', 'profile_picture': 'https://scontent.fmcc1-1.fna.fbcdn.net/v/t39.30808- {'user_url': 'https://facebook.com/kim.a.swisher?locale2=en_US', 'username': 'Kim Alcini Swisher', 'profile_picture':

neon-ninja commented 2 years ago

It looks like that issue occurs if you don't pass cookies. https://github.com/kevinzg/facebook-scraper/commit/1531ba91acca8ae6ddbfcffe8a16b70c2d191aab should fix it