kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.38k stars 627 forks source link

Make available a get_reactors(post_id) function #739

Open carlcotner opened 2 years ago

carlcotner commented 2 years ago

Hi, @kevinzg. If I'm understanding correctly, in the use-case where one only extracts reactors after determining that there have been new Likes to a given post, it would save half the requests if there were a separate get_reactors(post_id) function that didn't have to make two requests given a post_id. (Essentially the functionality of the code starting at line 791 of the file extractors.py.)

I would be happy to try doing this myself if you were able to give me pointers to what needed to be done to reuse the existing code while respecting its object-oriented structure, because I don't understand how everything is interrelated well-enough to not hack it all up.

neon-ninja commented 2 years ago

Half the requests? Requesting the post & reaction count, is just one request. Whereas requesting all reactors can easily be hundreds of requests. At which point, one extra request is negligible.

What you're describing is already possible though, if you set "reactors": "generator". You would request the post, check the reaction count, and only iterate through the generator if your condition is met. See https://github.com/kevinzg/facebook-scraper/issues/504#issuecomment-937351065.

https://github.com/kevinzg/facebook-scraper/commit/9f184edf4fd726a15d8b33ae9dbd47d670092c9a should make reactor extraction without calling get_posts possible, with this commit:

from facebook_scraper import *
set_cookies("cookies.json")
pprint(next(get_reactors(5746385992055254)))

outputs:

{'link': 'https://facebook.com/profile.php?id=100071427936816&fref=pb',
 'name': 'Doug Cockle',
 'type': 'like'}
carlcotner commented 2 years ago

Interesting, thanks, I didn't know about that. Should

posts = get_posts(username, cookies=cookie_file, extra_info=True,
                  options={ 'allow_extra_requests': False, 'reactions': True,
                            'comments': 'generator', 'reactors': 'generator',
                            'comment_reactors': False })

be a drop-in replacement for

posts = get_posts(username, cookies=cookie_file, extra_info=True,
                  options={'page_limit': None, 'allow_extra_requests': False, 'HQ_images': False})

?

Meaning, should I be able to replace the second statement with the first in a working program without changing the functionality of the program? I'm getting an error sometimes after the swap. (I'm a bit confused about what all the options do exactly.)

neon-ninja commented 2 years ago

It depends what you do with the result. A generator has a few minor differences from a list, but they are both iterable.

carlcotner commented 2 years ago

9f184ed should make reactor extraction without calling get_posts possible, with this commit:

from facebook_scraper import *
set_cookies("cookies.json")
pprint(next(get_reactors(5746385992055254)))

This works great! Thanks!

I looked at the definition of get_reactors(), and it looks very clean. I wish I understood Python and the overall structure of your design better so that I knew how it all fits together structurally. 😊