kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.39k stars 628 forks source link

Extracting Page Review #629

Open suarezjessie opened 2 years ago

suarezjessie commented 2 years ago

Does the library support extracting reviews for pages? Even just the overall review for the page? It doesn't show up when using get_page_info.

neon-ninja commented 2 years ago

It does:

pprint(get_page_info("SkyTowerAKL"))

outputs:

{'about': 'About\n'
          'Corner Victoria and Federal Streets, Auckland, New Zealand 1010\n'
          'Get Directions\n'
          'Rating · 4.5\n'
          '(2.6K reviews)\n'
          '304,574 people checked in here\n'
          '09-363 6000\n'
          'skytower@skycity.co.nz\n'
          'http://www.skycityauckland.co.nz/Attractions/Skytower.html\n'
          'Closed now\n'
          '·\n'
          '10:00 AM - 6:00 PM\n'
          'Closed now\n'
          '·\n'
          '10:00 AM - 6:00 PM\n'
          'Wednesday\n'
          'Thursday\n'
          'Friday\n'
          'Saturday\n'
          'Sunday\n'
          'Monday\n'
          'Tuesday\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          'Popular Hours\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          "One of New Zealand's most exhilarating and spectacular tourist "
          'attractions\n'
          "A truly captivating experience awaits visitors to Auckland's Sky "
          'Tower. At 328 metres, it is the tallest man-made structure in New '
          'Zealand and offers breathtaking views for up to 80 kilometres in '
          'every direction.\n'
          '\n'
          'Travel up in the glass-fronted lifts to one of the three '
          'spectacular viewing platforms, or for more thrills and excitement, '
          'SkyWalk round the pergola at 192 metres up or SkyJump off the '
          'Tower!\n'
          '\n'
          'Relax with a coffee and light refreshments at Sky Lounge or dine at '
          "Orbit - Auckland's only 360-degree revolving restaurant.\n"
          '\n'
          "Sky Tower is one of New Zealand's most exhilarating and spectacular "
          'tourist attractions, you will be amazed at what you can see and do '
          'under one roof!\n'
          'Price Range · $$\n'
          'Landmark & Historical Place\n'
          '·\n'
          'Restaurant\n'
          'See more\n'
          'See Less',
 'checkins': 304574,
 'likes': 68922,
 'people_talking_about_this': 612}

Note the 'Rating · 4.5\n' '(2.6K reviews)\n' In the about field

suarezjessie commented 2 years ago

Oh cool! Thanks for this. Although I think it doesn't handle some cases such as this one. Here's a Facebook Page with 3 reviews but they are not seen in the About

image

But the resulting About looks like this

About\n
Suggest edits\n
1121 B Labores Street Pandacan, 1011 Manila, Philippines\n
Get Directions\n
84 people checked in here\n
0998 963 3587\n
Send message\n
Open now\n
·\n
9 AM - 9:30 PM\n
Open now\n
·\n
9 AM - 9:30 PM\n
Monday\n
Tuesday\n
Wednesday\n
Thursday\n
Friday\n
Saturday\n
Sunday\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
Fresh, delicious, yummy, refreshing and affordable shake and juices only from Chamba Juice and Shake!!!\n
Price Range · $\n
Smoothie & Juice Bar\n
Products\n
smoothies, milktea, juies\n
See more\n
See Less

Would a separate feature be needed for extracting the Reviews Tab?

neon-ninja commented 2 years ago

I see - looks like chambajuice doesn't have an about page. This commit (https://github.com/kevinzg/facebook-scraper/commit/a516dfabff4b5937ef99ea25c84e463473a29e3d) should make get_page_info extract the rating, under a new key called rating. No need to raise a separate issue for the feature of extracting reviews, we can re-use this one

neon-ninja commented 2 years ago

This commit (https://github.com/kevinzg/facebook-scraper/commit/e362c522dd500c3c91ffb858c6044fca3d4b4d9a) should make it possible to extract reviews. Sample usage:

for review in get_page_info("chambajuice")["reviews"]:
    pprint(review)

outputs:

{'post_url': 'https://facebook.com/story.php?story_fbid=844190382691206&id=100013007553035&locale2=en_US&__tn__=%2As%2As',
 'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t1.6435-1/cp0/e15/q65/p40x40/176057649_1213006852476222_7349829092521007297_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=mQwiEkVN55cAX-txl-s&_nc_ht=scontent.fakl8-1.fna&oh=00_AT_dqKC3Yhu2jYV9Pf4HJhJmn0yjOMoobEoajX5k4rpWfg&oe=6216E02B',
 'recommends': True,
 'text': 'good taste ang milktea. creamy',
 'time': datetime.datetime(2019, 12, 31, 7, 31, 42),
 'timestamp': 1577730702,
 'user_url': 'https://facebook.com/app.bennok?locale2=en_US',
 'username': 'Boy Montaos'}
{'post_url': 'https://facebook.com/story.php?story_fbid=4077043325658195&id=100000577028543&locale2=en_US&__tn__=%2As%2As',
 'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-1/cp0/e15/q65/p40x40/252319237_5083768481652336_441345146184154296_n.jpg?_nc_cat=103&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=tr5R8QAt6-QAX-fEtyM&_nc_ht=scontent.fakl8-1.fna&oh=00_AT-P6GENcwW8sFQv1v1rnFmUbCeZYPwfUx0zXi9sK7bNiQ&oe=61F3BBC3',
 'recommends': True,
 'text': 'Super Affordable and yummy. napaka bilis pa nang service and '
         'delivery. 😊👍',
 'time': datetime.datetime(2020, 12, 9, 2, 27, 39),
 'timestamp': 1607434059,
 'user_url': 'https://facebook.com/hannahniah.lim?locale2=en_US',
 'username': 'Hananiah Fermin Lim'}
{'post_url': 'https://facebook.com/story.php?story_fbid=3178356272178102&id=100000112820909&locale2=en_US&__tn__=%2As%2As',
 'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-1/cp0/e15/q65/p40x40/218824795_6398471983499832_3335334123648518092_n.jpg?_nc_cat=104&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=sJ5901n2oWwAX8qVkQO&_nc_ht=scontent.fakl8-1.fna&oh=00_AT_VL2g9-9jqNkGWNfJOZJnxX9ejqgBsGMrFNdZ-wI_yCg&oe=61F38F86',
 'recommends': True,
 'text': 'ok naman. patamisin lang ng konti yung pearl 😊',
 'time': datetime.datetime(2019, 6, 4, 0, 53, 25),
 'timestamp': 1559566405,
 'user_url': 'https://facebook.com/yzhanyzhi?locale2=en_US',
 'username': 'Yazmine C J Bautista'}
suarezjessie commented 2 years ago

This page profile has review generator object but it throws out Content Not Found error.

Below is the code when scraping the page using get_page_info

from facebook_scraper import *
from pprint import pprint

set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
profile = get_page_info('atebeyandsell')
pprint(profile)

Below here is the output

{'about': 'About\n'
          '[http://instagram.com/atebeyofficial\n](http://instagram.com/atebeyofficial/n)'
          'Send message\n'
          'Entrepreneur · Gaming Video Creator\n'
          'See all',
 'address': None,
 'followers': 7964,
 'identifier': 107313684786461,
 'image': None,
 'likes': 6616,
 'name': 'Ate Bey and Sell',
 'profile_photo': 'https://scontent.fmnl4-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_eui2=AeH1BAfJPhyOlrPCVM-i5RSMckwgHk9sgqFyTCAeT2yCoRsyBrnuYXXkf8OdF8DXgEHEC2SHH_Dx7Ks7cSHtfxxq&_nc_ohc=WOM3rj6xiC0AX9Ct3Y1&_nc_ht=scontent.fmnl4-3.fna&oh=00_AT9WN2G_WwPbs8fQFtp1ho1EK9kQdfE9EK6q-WaB5ixntQ&oe=6232C265',
 'rating': 'Entrepreneur',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7f9cb1346430>,
 'sameAs': 'instagram.com/atebeyofficial',
 'type': 'Person',
 'url': 'https://www.facebook.com/atebeyandsell/'}

From the output, it can be seen that there is a generator object for the reviews key. However, when trying to access it using the code below

for i in profile['reviews']:
    print(i)

It throws the following error

NotFound                                  Traceback (most recent call last)
/var/folders/25/k79djfcj737dwxvhtkr192zr8p8x5p/T/ipykernel_15908/4199099289.py in <module>
----> 1 for i in profile['reviews']:
      2     print(i)

~/opt/anaconda3/envs/sample/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py in get_page_reviews(self, page, **kwargs)
    521         while more_url:
    522             logger.debug(f"Fetching {more_url}")
--> 523             response = self.get(more_url)
    524             if response.text.startswith("for (;;);"):
    525                 prefix_length = len('for (;;);')

~/opt/anaconda3/envs/sample/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py in get(self, url, **kwargs)
    805             if title:
    806                 if title.text.lower() in not_found_titles:
--> 807                     raise exceptions.NotFound(title.text)
    808                 elif title.text.lower() == "error":
    809                     raise exceptions.UnexpectedResponse("Your request couldn't be processed")

NotFound: Content Not Found
neon-ninja commented 2 years ago

The reviews aren't accessible at https://m.facebook.com/pg/atebeyandsell/reviews/ either. This page must have reviews disabled.

aminrabinia commented 2 years ago

The problem might be with m.facebook.com The mobile version cannot open some urls

neon-ninja commented 2 years ago

The reviews aren't accessible at https://www.facebook.com/atebeyandsell/reviews either