moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
62 stars 24 forks source link

Unable to get post from page and group #7

Closed mjfoo21 closed 6 months ago

mjfoo21 commented 7 months ago

I have attempted to extract posts from Facebook pages and groups but to no avail. The for loops return no results.

The cookies json were extracted from Get cookies.txt LOCALLY.

### To extract from pages

for post in get_posts('nintendo', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/nintendo?v=timeline", pages=3, cookies='facebook_cookies.json'):
    print(post['text'][:50])
    print(post)

# This fails with error "facebook_scraper.exceptions.LoginRequired: A login (cookies) is required to see this page"
# for post in get_posts('nintendo', base_url="https://mbasic.facebook.com", 
#         start_url="https://mbasic.facebook.com/nintendo?v=timeline", pages=1,
#         cookies='fb_json_cookies.json'):
#     print(post['text'][:50])

### To extract from groups

for post in get_posts(group='702649679892269', pages=1):
    print(post)
for post in get_posts(group='702649679892269', pages=1, cookies="fb_json_cookies.json"):
    print(post)

Any idea to resolve this is appreciated.

moda20 commented 7 months ago

@mjfoo21 always try to add the star_url argument when using get_posts and the base_url when not using post_urls arguemnt. as for nintendo, depending on your region there is a redirect that happens from 'nintendo' to 'nintendoAmerica' for exmaple that this repo doesn't handle yet. cookies are essential now since mbasic doesn't let you see any posts without loging in

mjfoo21 commented 7 months ago

Thanks @moda20 for the quick reply.

I have modified the code accordingly but still get nothing from the posts. Passing the cookies argument gives me error in the 3rd for loop but not the 2nd one, which is strange. I have also tried using Netscape cookies rather than json.

for post in get_posts('nintendoAmerica', base_url="https://mbasic.facebook.com",
                      start_url="https://mbasic.facebook.com/nintendoAmerica?v=timeline",
                      pages=3, cookies='facebook_cookies.json'):
    print(post['text'][:50])
    print(post)

for post in get_posts(group='702649679892269',
                      base_url="https://mbasic.facebook.com",
                      start_url="https://mbasic.facebook.com/702649679892269?v=timeline",
                      pages=3):
    print(post)

# This fails with error "facebook_scraper.exceptions.LoginRequired: A login (cookies) is required to see this page"
for post in get_posts(group='702649679892269',
                      base_url="https://mbasic.facebook.com",
                      start_url="https://mbasic.facebook.com/702649679892269?v=timeline",
                      pages=3, cookies="fb_json_cookies.json"):
    print(post)
moda20 commented 7 months ago

@mjfoo21 please extract your cookies in netscape and txt format, i only used those. also i only updated Pages scrapping on top of the original repo post, but not groups. please try to figure out a fix for it until i get to do it. i'll use your exmaple for my testing

micjgam commented 6 months ago

Ditto to the above... I'm not getting login errors however (pulled cookies from Edit This Cookie ext). What I am getting is an empty list while trying to scrape a group for which I am a member.

post = list(get_posts(group=971411739549035, base_url='https://mbasic.facebook.com', start_url='https://mbasic.facebook.com/groups/971411739549035?v=timeline', cookies='.cooke.json', pages=3))

post_list = [] for post in post: post_list.append(post['text']) print(post_list)

Result: [ ]

moda20 commented 6 months ago

@micjgam Please add this line at the start of your script to enable the builtin debug message for the library and see what the problem is :

import logging
logging.basicConfig(level=logging.DEBUG) 

Anther thing is to extract cookies in netscape format and in a txt file, that way we can be on the same page

ryanbuckner commented 6 months ago

I was having the same problem, the loop was returning nothing. When I enabled logging, I started getting results.

moda20 commented 6 months ago

@ryanbuckner the logging only prints the logs to the console, it doesn't affect the scrapping itself

ryanbuckner commented 6 months ago

Thanks. Is there a way to just get posts printed without any other info ?

moda20 commented 6 months ago

@ryanbuckner Just pick the attribute 'text' or 'full_text' (if found) and you will get just the text.