Open maorkavod opened 3 years ago
What group are you having this problem with?
print(len(list(get_posts(group=117507531664134, cookies="cookies.txt", options={"allow_extra_requests": False}))))
returns 357, so this groups is working fine for me
@neon-ninja
I am trying to crawl this Facebook group:
@neon-ninja This is happening to me in every group now :-/ I will try to provide a full debug log here.
Try without cookies? Or try join your scraping account into the group
My localhost appears to be working fine. When I try to run it from my server, there is a problem.
Error message : INFO:facebook_scraper.page_iterators:Page parser did not find next page URL
I use 100 proxy servers, paid servers that work well and are fast. I replace the proxy address with every iteration. When debug shows me the error above, the code just steps out of the function and returns zero posts. Next, I will investigate what is happening on the server..
@neon-ninja It worked today on several groups. I've replaced my IP address and useragent with cookies and without cookies. I'm still getting the same error, and it seems the code just exists after the "warning" in the debug.
Would it be possible for you to try crawling this page? https://www.facebook.com/groups/287564448778602/
@neon-ninja
HI, Now I can reproduce it on my local host as well. In the debug, I can see all posts crawled as follows:
Unfortunately, the function simply returns an object of this weird type , with no posts.
I can see the error in your screenshot, you're temporarily banned. get_posts is supposed to return a generator, this is normal.
@neon-ninja
I found that the URL that the scrapper package sends to get images hrefs keeps getting blocked. I will try to find an alternative method to get the image href.
The temporary ban is caused by excessive scraping. Either scrape less or slower. Alternatively just disable extra requests and then the scraper won't even try to get the high quality image
@neon-ninja
I need only the text and image, as well as the post id. If I pass these options:
options = { "comments": False, "reactors": False, "reactions": False, "allow_extra_requests": False, }
Will I still get the properties I need?
You would get low quality images, and longer text posts might be cut off
In the get_posts function, I passed the next options:
options = { "comments": False, "reactors": False, "reactions": False, "allow_extra_requests": False, }
I can't seem to get the image, everything else works fine. Here's the group URL: https://m.facebook.com/groups/Pishpeshuk.Nadlan
It would be great if you could test it out and see if you receive any images, would that be okay with you? That may be different from the temporarily blocked content by Facebook now rather than a diffrent html issue?
Yes, it works fine for me. This code:
pprint(next(get_posts(group="Pishpeshuk.Nadlan", options={"allow_extra_requests": False})))
returns
{'available': True,
'comments': 0,
'comments_full': None,
'factcheck': None,
'image': None,
'image_lowquality': 'https://scontent.fakl1-2.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/105610023_3682686048416784_7502958421092527351_n.jpg?_nc_cat=109&ccb=1-3&_nc_sid=ca434c&_nc_ohc=GcadebOLb5UAX9IjgIs&_nc_ht=scontent.fakl1-2.fna&tp=14&oh=b2e5736d4f5d51be28780a439874160c&oe=60DDE0F1',
'images': None,
'images_description': None,
'images_lowquality': ['https://scontent.fakl1-2.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/105610023_3682686048416784_7502958421092527351_n.jpg?_nc_cat=109&ccb=1-3&_nc_sid=ca434c&_nc_ohc=GcadebOLb5UAX9IjgIs&_nc_ht=scontent.fakl1-2.fna&tp=14&oh=b2e5736d4f5d51be28780a439874160c&oe=60DDE0F1'],
'images_lowquality_description': ['No photo description available.'],
'is_live': False,
'likes': 25,
'link': None,
'post_id': '3009648419116144',
'post_text': 'ברוכים הבאים לפשפשוק דירות-להשכרה/מכירה , בקבוצה ניתן לפרסם '
'דירות למכירה/להשכרה, סאבלט ושותפים בכל הארץ, דירות גן, '
"פנטהאוזים, דירות יוקרה, מחסנים, בתים, וילות , קוטג'ים ונכסים "
'עסקיים.\n'
'\n'
'*מתווכים שימו לב - מתווכים מורשים לפרסם מודעה אחת ביום בלבד, '
'במידה ומתווך יפרסם מעל מודעה אחת - הוא יוסר מהלוח.\n'
'\n'
'*אל תחסכו בפרטים - כתבו האם מדובר במכירה/השכרה, סוג הנכס, אזור '
'הנכס, מצב הנכס ועוד פרטים שיוכלו לעזור לרוכשים.\n'
'\n'
'*אנחנו נגד ספאם - פרסומים שלא קשורים לרוח הקבוצה, יוסרו והגולש '
'יורחק לצמיתות.\n'
'\n'
'*יש להקפיד על פרסום תמונות + מחיר למען נוחיות הגולשים .\n'
'\n'
'גלישה נעימה ופשפוש מהנה לכולם,\n'
'\n'
'פשפשוק דירות להשכרה/מכירה',
'post_url': 'https://m.facebook.com/groups/Pishpeshuk.Nadlan/permalink/3009648419116144/',
'reaction_count': None,
'reactions': None,
'reactors': None,
'shared_post_id': None,
'shared_post_url': None,
'shared_text': '',
'shared_time': None,
'shared_user_id': None,
'shared_username': None,
'shares': 1,
'text': 'ברוכים הבאים לפשפשוק דירות-להשכרה/מכירה , בקבוצה ניתן לפרסם דירות '
'למכירה/להשכרה, סאבלט ושותפים בכל הארץ, דירות גן, פנטהאוזים, דירות '
"יוקרה, מחסנים, בתים, וילות , קוטג'ים ונכסים עסקיים.\n"
'\n'
'*מתווכים שימו לב - מתווכים מורשים לפרסם מודעה אחת ביום בלבד, במידה '
'ומתווך יפרסם מעל מודעה אחת - הוא יוסר מהלוח.\n'
'\n'
'*אל תחסכו בפרטים - כתבו האם מדובר במכירה/השכרה, סוג הנכס, אזור הנכס, '
'מצב הנכס ועוד פרטים שיוכלו לעזור לרוכשים.\n'
'\n'
'*אנחנו נגד ספאם - פרסומים שלא קשורים לרוח הקבוצה, יוסרו והגולש יורחק '
'לצמיתות.\n'
'\n'
'*יש להקפיד על פרסום תמונות + מחיר למען נוחיות הגולשים .\n'
'\n'
'גלישה נעימה ופשפוש מהנה לכולם,\n'
'\n'
'פשפשוק דירות להשכרה/מכירה',
'time': datetime.datetime(2020, 6, 26, 6, 11, 47),
'user_id': '407003609318394',
'user_url': 'https://facebook.com/pishpeshuk.co.il/?refid=18&_ft_=top_level_post_id.3009648419116144%3Acontent_owner_id_new.407003609318394%3Apage_id.407003609318394%3Astory_location.6%3Astory_attachment_style.photo%3Atds_flgs.3%3Aott.AX9SGtYoZmErvghw%3Apage_insights.%7B%22407003609318394%22%3A%7B%22page_id%22%3A407003609318394%2C%22page_id_type%22%3A%22page%22%2C%22actor_id%22%3A407003609318394%2C%22dm%22%3A%7B%22isShare%22%3A0%2C%22originalPostOwnerID%22%3A0%7D%2C%22psn%22%3A%22EntGroupMallPostCreationStory%22%2C%22post_context%22%3A%7B%22object_fbtype%22%3A657%2C%22publish_time%22%3A1593108707%2C%22story_name%22%3A%22EntGroupMallPostCreationStory%22%2C%22story_fbid%22%3A%5B3009648419116144%5D%7D%2C%22role%22%3A1%2C%22sl%22%3A6%2C%22targets%22%3A%5B%7B%22actor_id%22%3A407003609318394%2C%22page_id%22%3A407003609318394%2C%22post_id%22%3A3009648419116144%2C%22role%22%3A1%2C%22share_id%22%3A0%7D%5D%7D%2C%22303803599700653%22%3A%7B%22page_id%22%3A303803599700653%2C%22page_id_type%22%3A%22group%22%2C%22actor_id%22%3A407003609318394%2C%22dm%22%3A%7B%22isShare%22%3A0%2C%22originalPostOwnerID%22%3A0%7D%2C%22psn%22%3A%22EntGroupMallPostCreationStory%22%2C%22post_context%22%3A%7B%22object_fbtype%22%3A657%2C%22publish_time%22%3A1593108707%2C%22story_name%22%3A%22EntGroupMallPostCreationStory%22%2C%22story_fbid%22%3A%5B3009648419116144%5D%7D%2C%22role%22%3A1%2C%22sl%22%3A6%7D%7D&__tn__=C-R',
'username': 'פשפשוק',
'video': None,
'video_duration_seconds': None,
'video_height': None,
'video_id': None,
'video_quality': None,
'video_size_MB': None,
'video_thumbnail': None,
'video_watches': None,
'video_width': None,
'w3_fb_url': None}
The property works great! I appreciate your help. Compared to paid services such as proxycrawel, your facebook scrapper works better :)
Hi, The NPM package for Facebook Scraper has been updated to version 0.23.43. Since then, I have not scrapped any group. The script just stops when I see this message with debug mode:
INFO:facebook_scraper.page_iterators:Page parser did not find next page URL
What could be the cause of this? I use 100 proxy servers and Facebook cookies. Everything went well yesterday.
I appreciate your help.