Open carlcotner opened 2 years ago
Half the requests? Requesting the post & reaction count, is just one request. Whereas requesting all reactors can easily be hundreds of requests. At which point, one extra request is negligible.
What you're describing is already possible though, if you set "reactors": "generator"
. You would request the post, check the reaction count, and only iterate through the generator if your condition is met. See https://github.com/kevinzg/facebook-scraper/issues/504#issuecomment-937351065.
https://github.com/kevinzg/facebook-scraper/commit/9f184edf4fd726a15d8b33ae9dbd47d670092c9a should make reactor extraction without calling get_posts
possible, with this commit:
from facebook_scraper import *
set_cookies("cookies.json")
pprint(next(get_reactors(5746385992055254)))
outputs:
{'link': 'https://facebook.com/profile.php?id=100071427936816&fref=pb',
'name': 'Doug Cockle',
'type': 'like'}
Interesting, thanks, I didn't know about that. Should
posts = get_posts(username, cookies=cookie_file, extra_info=True,
options={ 'allow_extra_requests': False, 'reactions': True,
'comments': 'generator', 'reactors': 'generator',
'comment_reactors': False })
be a drop-in replacement for
posts = get_posts(username, cookies=cookie_file, extra_info=True,
options={'page_limit': None, 'allow_extra_requests': False, 'HQ_images': False})
?
Meaning, should I be able to replace the second statement with the first in a working program without changing the functionality of the program? I'm getting an error sometimes after the swap. (I'm a bit confused about what all the options do exactly.)
It depends what you do with the result. A generator has a few minor differences from a list, but they are both iterable.
9f184ed should make reactor extraction without calling
get_posts
possible, with this commit:from facebook_scraper import * set_cookies("cookies.json") pprint(next(get_reactors(5746385992055254)))
This works great! Thanks!
I looked at the definition of get_reactors()
, and it looks very clean. I wish I understood Python and the overall structure of your design better so that I knew how it all fits together structurally. 😊
Hi, @kevinzg. If I'm understanding correctly, in the use-case where one only extracts reactors after determining that there have been new Likes to a given post, it would save half the requests if there were a separate
get_reactors(post_id)
function that didn't have to make two requests given apost_id
. (Essentially the functionality of the code starting at line 791 of the fileextractors.py
.)I would be happy to try doing this myself if you were able to give me pointers to what needed to be done to reuse the existing code while respecting its object-oriented structure, because I don't understand how everything is interrelated well-enough to not hack it all up.