kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.46k stars 634 forks source link

Cookies get only 4 posts #720

Open doobybug opened 2 years ago

doobybug commented 2 years ago

Hi I am trying to scrape a couple of pages but once I include the cookies file somehow, some pages do not return more than 4 posts. You can see the code below. For example if I use the page Paul-Mark-Supermarket-147867532048848 only 4 posts are returned while timesofmalta works fine. Any idea why this happens?

`with open('aom_fb_3.csv', 'w', encoding = 'utf-8', newline='') as file: writer = csv.writer(file) writer.writerow(["Post_ID", "Time", "Text", "Likes", "Shares", "Live", "Image", "Video"])

while True:
    try:
        for post_idx, post in enumerate(get_posts(
            "Paul-Mark-Supermarket-147867532048848",
            cookies = 'cookies.txt',
            page_limit=None,  # try to get all pages and then decide where to stop
            start_url=search_page_persistor.get_current_search_page(),
            request_url_callback=search_page_persistor.set_search_page,
            posts = 200,
            timeout=120
        )):
            images = False
            if(post['images']!=None):
                images = True
            video = False
            if(post['video']!=None):
                video = True
            text = post['text'];
            final_str = ''
            maltese_fonts = ['ż','ħ','ġ','ċ','Ż','Ħ','Ġ','Ċ']
            for char in text:
                if char in string.printable or char in maltese_fonts:
                    final_str += char
            print(post['time'])
            post = [post['post_id'], post['time'], final_str, post['likes'], post['shares'], post['is_live'],images, video]

            writer.writerow(post)

        file.close()
        print("Finished!")
        break
    except exceptions.TemporarilyBanned:
        print("Temporarily banned, sleeping for 1m")
        time.sleep(60)`
neon-ninja commented 2 years ago

This page works fine for me, the code:

set_cookies("cookies.json")
posts = list(get_posts('Paul-Mark-Supermarket-147867532048848', pages=2, options={"allow_extra_requests": False, "posts_per_page": 100}))
print(len(posts))

outputs 101. Do you get the same result? Try enable_logging() - do you get any errors in the log?