kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.38k stars 628 forks source link

reactors no longer being returned #692

Open curiousier-george opened 2 years ago

curiousier-george commented 2 years ago

Just today, reactors stopped being returned for me. The following program exhibits the problem.

from facebook_scraper import get_posts, set_user_agent
from pprint import pprint
import sys

cookie_file = 'facebook_cookies.txt'

set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")

post_ids = sys.argv[1 : ]

for post_id in post_ids:
    post = next(get_posts(post_urls=[post_id], cookies=cookie_file,
                          options={'allow_extra_requests': False, 'reactors': True}))
    pprint(post)

When I invoke this as

python reactors.py 10158741881073601

the reactors field returned is None, although there are actually reactors to the post. It was working until today.

neon-ninja commented 2 years ago

I merged https://github.com/kevinzg/facebook-scraper/pull/707 into master branch, and that fixed reactor extraction for me for this post. Give latest master branch a try and see how you go.

curiousier-george commented 2 years ago

Hmm, no, that didn't work for me. What's really weird, though, is that I had made the changes in #707 in my own copy of facebook_scraper, and that did work for me - at least for links and names, although not types.

neon-ninja commented 2 years ago

Maybe you still had an old version of the library. Try pip uninstall facebook-scraper twice before running pip install git+https://github.com/kevinzg/facebook-scraper.git

ben31406 commented 2 years ago

Sorry, I've encountered the same problem.

I've tried pip uninstall facebook-scraper twice and then pip install git+https://github.com/kevinzg/facebook-scraper.git, then I copied the code George (who raised this issue) posted and made some revision, such as

from facebook_scraper import get_posts, set_user_agent
from pprint import pprint

cookies=MY_COOKIES

set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")

post_ids = ["10158741881073601"]

for post_id in post_ids:
    post = next(
        get_posts(
            post_urls=[post_id],
            cookies=cookies,
            options={
                'allow_extra_requests': False,
                'reactors': True
            }
        )
    )
    pprint(post)

and I got the return like this (I only showed part of it):

'page_id': None,
 'post_id': '10158741881073601',
 'post_text': 'Penne Smith Sandbeck',
 'post_url': 'https://m.facebook.com/10158741881073601',
 'reaction_count': 14,
 'reactions': {'care': 3, 'like': 8, 'love': 1, 'sad': 2},
 'reactors': [],
 'shared_post_id': None,
 'shared_post_url': None,
 'shared_text': None,

The reactors field is an empty list, but it seems that actually there are some reactors on this post.

ben31406 commented 2 years ago

I'm not sure if I found the problem. It seems that it'll raise an exception during this line k = str(demjson.decode(sigil.attrs.get("data-store"))["reactionType"]) I added a breakpoint() before this line, and checked for demjson.decode(sigil.attrs.get("data-store")), it returned like this {'reactionID': 478547315650144}, which didn't contain the key, reactionType

neon-ninja commented 2 years ago

I see - try https://github.com/kevinzg/facebook-scraper/commit/5539ec467223286d952c5af5c19d89a8cffedb17

With this commit I get:

'reaction_count': 14,
 'reactions': {'care': 3, 'like': 8, 'love': 1, 'sad': 2},
 'reactors': [{'link': 'https://facebook.com/lynnswisher.spears?fref=pb',
               'name': 'Lynn Swisher Spears',
               'type': 'care'},
              {'link': 'https://facebook.com/profile.php?id=100001509791215&fref=pb',
               'name': 'D.j. Bost',
               'type': 'like'},
              {'link': 'https://facebook.com/audra.halemaddox?fref=pb',
               'name': 'Audra Hale-Maddox',
               'type': 'care'},
              {'link': 'https://facebook.com/lin.stogner?fref=pb',
               'name': 'Lin Stogner',
               'type': 'like'},
              {'link': 'https://facebook.com/pam.morris.73?fref=pb',
               'name': 'Pam Morris',
               'type': 'sad'},
              {'link': 'https://facebook.com/shane.petersen.507?fref=pb',
               'name': 'Shane Petersen',
               'type': 'like'},
              {'link': 'https://facebook.com/jeroen.vandenhurk?fref=pb',
               'name': 'Jeroen van den Hurk',
               'type': 'sad'},
              {'link': 'https://facebook.com/judy.e.woodall?fref=pb',
               'name': 'Judy Edwards Woodall',
               'type': 'like'},
              {'link': 'https://facebook.com/kari.tgeorge?fref=pb',
               'name': 'Kari Turcogeorge',
               'type': 'love'},
              {'link': 'https://facebook.com/susan.r.briley?fref=pb',
               'name': 'Susan Reesman Briley',
               'type': 'like'},
              {'link': 'https://facebook.com/hutson.nick?fref=pb',
               'name': 'Nick Hutson',
               'type': 'like'},
              {'link': 'https://facebook.com/jeffrey.harris.1441?fref=pb',
               'name': 'Jeffrey Harris',
               'type': 'care'},
              {'link': 'https://facebook.com/holden.richards?fref=pb',
               'name': 'Holden Richards',
               'type': 'like'},
              {'link': 'https://facebook.com/darrell.e.cook?fref=pb',
               'name': 'Darrell E. Cook',
               'type': 'like'}],
curiousier-george commented 2 years ago

I apologize for my ignorance, but will pip install git+https://github.com/kevinzg/facebook-scraper.git install that commit?

neon-ninja commented 2 years ago

It should do, yes

neon-ninja commented 2 years ago

I've also just pushed a new version (0.2.55) to PyPI, so pip install -U facebook-scraper would now do it too

curiousier-george commented 2 years ago

Yes, works great! Thank you!

ben31406 commented 2 years ago

Thank you so much! It works great in that post, but it doesn't work in some posts, such as 5561190327250419. In some cases, it only returns one or two reactors, or even 0.

neon-ninja commented 2 years ago

I see - try https://github.com/kevinzg/facebook-scraper/commit/c41e14e1c8271ae82d2e981d64bf8cd21db08a85

ben31406 commented 2 years ago

Yes! It works great now. Thank you so much!

ben31406 commented 2 years ago

Sorry, I encountered another problem about reactors. In some specific facebook fanpage, I can't get correct post_id and reactors, but it works for others. Below is my testing code,

from facebook_scraper import get_posts
from pprint import pprint

cookies = {
    "wd": "XXX",
    "datr": "XXX",
    "sb": "XXX",
    "c_user": "XXX",
    "xs": "XXX",
    "fr": "XXX",
}

posts = get_posts(
    post_urls=["https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462"],
    options={
        "allow_extra_requests": False,
        "comments": "generator",
        "reactors": True,
        "reactions": True,
        "comment_reactors": False,
    },
    cookies=cookies,
)
post = next(posts)
pprint(post)

Here is part of the return. The post_id seems to be sourced from the first comment instead of the post itself, and the same problem is found in reactions and reactors fields as well.

'page_id': '1536864699976440',
 'post_id': '524560755699898_524611979028109',
 'post_text': '同學、學長、學妹傳來的照片\n'
              '阿金的書在吉隆坡 IPC, 新山 Mid Valley, 新加坡 Popular 目前都有展示\n'
              '\n'
              'IPC 還是「海景第一排」呢!\n'
              '和新馬的朋友分享~',
 'post_url': 'https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462',
 'reaction_count': 1,
 'reactions': {'like': 1},
 'reactors': [{'link': 'https://facebook.com/icudoctor?fref=pb',
               'name': 'Icu醫生陳志金',
               'type': 'like'}],

I found some discussions talking about cookies, and here is the return after I added "noscript": "1" in my cookies,

'page_id': '1536864699976440',
 'post_id': '524560755699898',
 'post_text': '同學、學長、學妹傳來的照片\n'
              '阿金的書在吉隆坡 IPC, 新山 Mid Valley, 新加坡 Popular 目前都有展示\n'
              '\n'
              'IPC 還是「海景第一排」呢!\n'
              '和新馬的朋友分享~',
 'post_url': 'https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462',
 'reaction_count': None,
 'reactions': None,
 'reactors': None,

the post_id is correct now, but the reactors and reactions fields turned to be None.

neon-ninja commented 2 years ago

Here's the output I get with your test code:

'page_id': '1536864699976440',
 'post_id': 524560755699898,
 'post_text': '同學、學長、學妹傳來的照片\n'
              '阿金的書在吉隆坡 IPC, 新山 Mid Valley, 新加坡 Popular 目前都有展示\n'
              '\n'
              'IPC 還是「海景第一排」呢!\n'
              '和新馬的朋友分享~',
 'post_url': 'https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462',
 'reaction_count': 568,
 'reactions': {'care': 1, 'like': 563, 'love': 2, 'wow': 2},
 'reactors': [{'link': 'https://facebook.com/profile.php?id=100080093550026&fref=pb',
               'name': 'Leo Hsu',
               'type': 'like'},
              {'link': 'https://facebook.com/profile.php?id=100077404213696&fref=pb',
               'name': '林幸君',
               'type': 'like'},

Try update to latest master branch, and try set your Facebook language to English. Also try:

from facebook_scraper import _scraper
with open("524560755699898.html", "w") as f:
    f.write(_scraper.get("524560755699898").html.html)

and upload the resulting HTML file