Open lullu57 opened 3 months ago
@lullu57 were you able to get a solution to this ? i am not sure what's the issue here
I wasn't able to discover whats causing the issue unfortunately, could be something related to the cookies, but I have tried everything as described.
I don't know much about this, but I don't think the problem is the cookies, since the response does contains a post: "Save on select games featuring Mario and friends. Offer ends 3/16"
Maybe it's an issue with the html parser in the library ?
I tried to debug into the repo and it seems like even though the posts are there, it does not recognise them and so does not iterate over them, so I think that could be the issue. I built a pyppeteer script that gets the page information that I need (only working with pages) and I'm working with that. For posts, i found this other repo which is working for me: https://github.com/shaikhsajid1111/facebook_page_scraper
@lullu57 Pretty sure facebook is rolling a new dom update for mbasic to confuse scrappers like this repo, for me this is still not an issue so it would be great if you can check your html returns for posts (or @iTrooz ) and see if the <article>
tag or <div role="article">
is representing the individual posts. that how the scrapper finds out about posts.
i do also have a fork of that repo : https://github.com/moda20/facebook_page_scraper that gets all images and in high res and gets other useful uniqueIds, i am using it now but with some smaller issues to be resolved. I still however use this repo for full text and sometimes Highres images extraction.
I don't get posts either. my relevant code looks like this:
for group in GROUP_NAMES:
for post in get_posts(group=group, pages=10, options={"posts_per_page": 50, "allow_extra_requests": False}):
post_text = post.get('post_text')
if not post_text:
continue
if any(word in post_text for word in INTERESTS) and not any(word in post_text for word in IGNORE):
post_url = post['post_url'].replace("https://m.", "https://", 1)
if post_url in prev_urls:
print(f"Skipped URL {post_url}")
continue
data['date'].append(post['time'])
data['link'].append(post_url)
data['info'].append(post_text[:200])
data['username'].append(post['username'])
data['#comments'].append(post['comments'])
data['#likes'].append(post['likes'])
print("Finished parsing Facebook results")
if not data['date']:
print("No data for file")
exit()
No data for file is printed all the time although the groups I added are public.
I cannot understand why FB is such a jerk by closing all these routes. Why not let developers pay for their data? Then everyone would be happy.
Hi, Opening the issue again as updated to most recently updated version and get posts still not working. Below is the code and terminal output:
The headers.json have been set as described in issue #22 and cookies are extracted in Json format using Get cookies.txt LOCALLY extension, while on the site.