moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
78 stars 28 forks source link

post_id = None #5

Closed streamcon closed 9 months ago

streamcon commented 10 months ago

I can't get post_id when parsing

My code:

from facebook_scraper import *
i="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
set_user_agent(i)
for post in get_posts('100038659142270', cookies='cookie2.json'):
            print(post)
            print(post['text'][:50])

output:

{'post_id': None, 'text': 'У меня информация для родителей прекрасного города Мюнхена и всех, кому до него рукой подать\nУже в это воскресенье у вас там будет семинар с Еленой Журавлевой "Что нам делать с этой домашкой?", конечно, не только про домашку, а вообще про роль родителей в отношениях ребенка со школой, мотивацию и все то, про что вы всегда спрашиваете:)\nЛена про это… More рассказывает офигенно и очень умеет сделать так, чтобы родителей поотпустило, а договариваться с детьми стало легче.\nВ общем, кому актуально - сделайте подарок своей нервной системе и своим отношениям с детьми\nСсылка в первом комментарии', 'post_text': 'У меня информация для родителей прекрасного города Мюнхена и всех, кому до него рукой подать\nУже в это воскресенье у вас там будет семинар с Еленой Журавлевой "Что нам делать с этой домашкой?", конечно, не только про домашку, а вообще про роль родителей в отношениях ребенка со школой, мотивацию и все то, про что вы всегда спрашиваете:)\nЛена про это… More рассказывает офигенно и очень умеет сделать так, чтобы родителей поотпустило, а договариваться с детьми стало легче.\nВ общем, кому актуально - сделайте подарок своей нервной системе и своим отношениям с детьми\nСсылка в первом комментарии', 'shared_text': '', 'original_text': 'У меня информация для родителей прекрасного города Мюнхена и всех, кому до него рукой подать\nУже в это воскресенье у вас там будет семинар с Еленой Журавлевой "Что нам делать с этой домашкой?", конечно, не только про домашку, а вообще про роль родителей в отношениях ребенка со школой, мотивацию и все то, про что вы всегда спрашиваете:)\nЛена про это… More рассказывает офигенно и очень умеет сделать так, чтобы родителей поотпустило, а договариваться с детьми стало легче.\nВ общем, кому актуально - сделайте подарок своей нервной системе и своим отношениям с детьми\nСсылка в первом комментарии', 'time': datetime.datetime(2023, 11, 22, 16, 13), 'timestamp': None, 'image': None, 'image_lowquality': None, 'images': [], 'images_description': [], 'images_lowquality': [], 'images_lowquality_description': [], 'video': None, 'video_duration_seconds': None, 'video_height': None, 'video_id': None, 'video_quality': None, 'video_size_MB': None, 'video_thumbnail': None, 'video_watches': None, 'video_width': None, 'likes': 517, 'comments': 6, 'shares': 19, 'post_url': 'https://facebook.com/story.php?story_fbid=pfbid037k8TVCCTbJM32kjHh2K6iWvdprSgmFjvxik8KogUx6UXfPVMYPiGQqfoS6Y5L1Ufl&id=100038659142270', 'link': None, 'links': [{'link': '/story.php?story_fbid=pfbid037k8TVCCTbJM32kjHh2K6iWvdprSgmFjvxik8KogUx6UXfPVMYPiGQqfoS6Y5L1Ufl&id=100038659142270&eav=AfYbBWQwOZPvO9oMASU4RqiPm00e-Zc64d2iVDj6qkX_wuXAa9EjL8Hbe1ZtWqRad2k&refid=17&paipv=0', 'text': 'More'}, {'link': '/story.php?story_fbid=pfbid037k8TVCCTbJM32kjHh2K6iWvdprSgmFjvxik8KogUx6UXfPVMYPiGQqfoS6Y5L1Ufl&id=100038659142270&eav=AfYbBWQwOZPvO9oMASU4RqiPm00e-Zc64d2iVDj6qkX_wuXAa9EjL8Hbe1ZtWqRad2k&m_entstream_source=timeline&refid=17&_ft_=encrypted_tracking_data.0AY-HW9xzbWptir5dWvRKhRNOjmGw9R42jBChOUVE-YjmHP3CmJLVD79ZyvGaoEEtQ_g9jwZv2CpcBMywwTcZJ6M96wxRm9B88q_moTHR36f_gsE-ns82yFvalMbnDDv1HfAervDsIjTetQwu9ywBAoTyPAbI65HoQORFOsYBCaXfyDS-CK7Uej7dKCqbz7Qfu8xdYDRDAiCJMrEFzR_fiAqZVxMcg3iKKFQPxw66c16qDXOVwPzvhL62ZEOHDRooV7HcN9v8ZvzdCTVNOTC-Rz0fowsshMrNhVbN67QYRV6rLgAML18WEWWtpMAq55uJWszh0zaL_vE6-fTK6EonSeqKox8glqlL1FCWNto0i3bscSN9mtvMwf_g8oSNm6UhoocUFMd7QuhUhJpRk1WM1YO8Bt76E9QGzl7cLo5ge8mZfaKDiNf5oKVAffk8ABIK7kMBaQVj9OWTmYCChek7e60-ICO12TGe-7qYuwrXGQXVmR55FZ1GiFt0TnoKorGvf3bVkCjIQWX75Lj1hdZXZIKfkayV33S0Rz-h5rFgnEZkSs1ozWLqmzZjtx8DIhzWbwNSBhbG_Rbqg7XnNCThDtZdf9oF5r7yKrUDhAFS74-3HTLPuN7GvV2ivyU3MSYQWc_fAu0AOEEd&__tn__=%2As%2As-R&paipv=0', 'text': ''}], 'user_id': None, 'username': 'Людмила Петрановская', 'user_url': 'https://facebook.com/lv.petranovskaya?lst=61553879472338%3A100038659142270%3A1702119234&eav=AfZbXyHoRHuMJQAou_VCGC4gY_1dqYi6vgNRNosqJSLTAXXa2bwyjg4UK84d5Ka7Yqg&refid=17&_ft_=encrypted_tracking_data.0AY-HW9xzbWptir5dWvRKhRNOjmGw9R42jBChOUVE-YjmHP3CmJLVD79ZyvGaoEEtQ_g9jwZv2CpcBMywwTcZJ6M96wxRm9B88q_moTHR36f_gsE-ns82yFvalMbnDDv1HfAervDsIjTetQwu9ywBAoTyPAbI65HoQORFOsYBCaXfyDS-CK7Uej7dKCqbz7Qfu8xdYDRDAiCJMrEFzR_fiAqZVxMcg3iKKFQPxw66c16qDXOVwPzvhL62ZEOHDRooV7HcN9v8ZvzdCTVNOTC-Rz0fowsshMrNhVbN67QYRV6rLgAML18WEWWtpMAq55uJWszh0zaL_vE6-fTK6EonSeqKox8glqlL1FCWNto0i3bscSN9mtvMwf_g8oSNm6UhoocUFMd7QuhUhJpRk1WM1YO8Bt76E9QGzl7cLo5ge8mZfaKDiNf5oKVAffk8ABIK7kMBaQVj9OWTmYCChek7e60-ICO12TGe-7qYuwrXGQXVmR55FZ1GiFt0TnoKorGvf3bVkCjIQWX75Lj1hdZXZIKfkayV33S0Rz-h5rFgnEZkSs1ozWLqmzZjtx8DIhzWbwNSBhbG_Rbqg7XnNCThDtZdf9oF5r7yKrUDhAFS74-3HTLPuN7GvV2ivyU3MSYQWc_fAu0AOEEd&__tn__=C-R&paipv=0', 'is_live': False, 'factcheck': None, 'shared_post_id': None, 'shared_time': None, 'shared_user_id': None, 'shared_username': None, 'shared_user_url': None, 'shared_post_url': None, 'available': True, 'comments_full': None, 'reactors': None, 'w3_fb_url': None, 'reactions': None, 'reaction_count': 517, 'with': None, 'page_id': None, 'sharers': None, 'is_truncated_text': 'true', 'full_post_url': 'https://mbasic.facebook.com/story.php?story_fbid=pfbid037k8TVCCTbJM32kjHh2K6iWvdprSgmFjvxik8KogUx6UXfPVMYPiGQqfoS6Y5L1Ufl&id=100038659142270&eav=AfYbBWQwOZPvO9oMASU4RqiPm00e-Zc64d2iVDj6qkX_wuXAa9EjL8Hbe1ZtWqRad2k&refid=17&paipv=0', 'translated_text': 'I have information for the parents of the beautiful city of Munich and everyone to whom I can help\nAlready this Sunday you will have a seminar there with Elena Zhuravleva "What should we do with this homework?" ", of course, not only about homework, but generally about the role of parents in the child\'s relationship with school, motivation and everything you always ask about :)\nLena talks about it awesomely and is very good at making it so that parents will be released and it became easier to negotiate with children.\nGenerally, to whom it may concern - make a gift to your nervous system and your relationship with children\nLink is in first comment', 'translated_post_text': 'I have information for the parents of the beautiful city of Munich and everyone to whom I can help\nAlready this Sunday you will have a seminar there with Elena Zhuravleva "What should we do with this homework?" ", of course, not only about homework, but generally about the role of parents in the child\'s relationship with school, motivation and everything you always ask about :)\nLena talks about it awesomely and is very good at making it so that parents will be released and it became easier to negotiate with children.\nGenerally, to whom it may concern - make a gift to your nervous system and your relationship with children\nLink is in first comment', 'translated_shared_text': '', 'image_id': None, 'image_ids': [], 'was_live': False}
У меня информация для родителей прекрасного города
moda20 commented 10 months ago

@streamcon teh app uses m.fb by default and that doesn't work anymore, we are using mbasic for everything now. please use the get_posts like this, specifically the base_url and start_url arguments :

for post in get_posts('nintendo', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/nintendo?v=timeline", pages=1):
...     print(post['text'][:50])
streamcon commented 10 months ago

or add

re.findall(r"share_id\":([\d:\"]*)", self.element.attrs["data-store"])

in def extract_post_id in extractors.py

moda20 commented 10 months ago

@streamcon that only works for m.facebook and we don't know exactly how that interacts with the other posts, for groups, single posts and so on. the easiest way to get the repo to work again is to use the mbasic attributes. besides the fb website changes based on the cookies you are using but it seems not to be the case for mbasic, so i think we should stick with it for now

streamcon commented 10 months ago

but

from facebook_scraper import get_posts, set_user_agent

set_user_agent("Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")

for post in get_posts('nintendo', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/nintendo?v=timeline", pages=3, cookies='cookie2.json'):
     print(post['text'][:50])
     print(post)

output

Thank you to all for enjoying The Legend of Zelda:
{'post_id': None, 'text': 'Thank you to all for enjoying The Legend of Zelda: Tears of the Kingdom and voting for it at The Game Awards this year!', 'post_text': 'Thank you to all for enjoying The Legend of Zelda: Tears of the Kingdom and voting for it at The Game Awards this year!', 'shared_text': '', 'original_text': None, 'time': datetime.datetime(2023, 12, 8, 9, 32), 'timestamp': None, 'image': 'https://scontent-arn2-1.xx.fbcdn.net/v/t39.30808-6/406464103_739497268205878_6182010635183516327_n.jpg?stp=cp0_dst-jpg_e15_fr_q65&_nc_cat=104&ccb=1-7&_nc_sid=ab7367&efg=eyJpIjoidCJ9&_nc_ohc=n-lrDah-bq8AX-7FHsB&_nc_ht=scontent-arn2-1.xx&oh=00_AfDbrpaMe8Y6DOF-JO8BpZMv0n28I2R9Nro7y4TShKmLTw&oe=657A70BE&manual_redirect=1', 'image_lowquality': 'https://scontent-arn2-1.xx.fbcdn.net/v/t39.30808-6/406464103_739497268205878_6182010635183516327_n.jpg?stp=cp0_dst-jpg_e15_q65_s1080x2048&_nc_cat=104&ccb=1-7&_nc_sid=ab7367&efg=eyJpIjoiYiJ9&_nc_ohc=n-lrDah-bq8AX-7FHsB&_nc_ht=scontent-arn2-1.xx&oh=00_AfDvtioHQrAozlXs7B6he1imqokRmJfftRXN1ncIH0Q7XQ&oe=657A70BE', 'images': ['https://scontent-arn2-1.xx.fbcdn.net/v/t39.30808-6/406464103_739497268205878_6182010635183516327_n.jpg?stp=cp0_dst-jpg_e15_fr_q65&_nc_cat=104&ccb=1-7&_nc_sid=ab7367&efg=eyJpIjoidCJ9&_nc_ohc=n-lrDah-bq8AX-7FHsB&_nc_ht=scontent-arn2-1.xx&oh=00_AfDbrpaMe8Y6DOF-JO8BpZMv0n28I2R9Nro7y4TShKmLTw&oe=657A70BE&manual_redirect=1'], 'images_description': ["May be a graphic of \u200etext that says '\u200eTHEGAME GAME AWARDS WINNER BEST ACTION/ADVENTURE GAME ZELDA ۔ KINGDOM ÛHELEED ELDA TEARS\u200e'\u200e"], 'images_lowquality': ['https://scontent-arn2-1.xx.fbcdn.net/v/t39.30808-6/406464103_739497268205878_6182010635183516327_n.jpg?stp=cp0_dst-jpg_e15_q65_s1080x2048&_nc_cat=104&ccb=1-7&_nc_sid=ab7367&efg=eyJpIjoiYiJ9&_nc_ohc=n-lrDah-bq8AX-7FHsB&_nc_ht=scontent-arn2-1.xx&oh=00_AfDvtioHQrAozlXs7B6he1imqokRmJfftRXN1ncIH0Q7XQ&oe=657A70BE'], 'images_lowquality_description': ["May be a graphic of \u200etext that says '\u200eTHEGAME GAME AWARDS WINNER BEST ACTION/ADVENTURE GAME ZELDA ۔ KINGDOM ÛHELEED ELDA TEARS\u200e'\u200e"], 'video': None, 'video_duration_seconds': None, 'video_height': None, 'video_id': None, 'video_quality': None, 'video_size_MB': None, 'video_thumbnail': None, 'video_watches': None, 'video_width': None, 'likes': 10000, 'comments': 563, 'shares': 1600, 'post_url': 'https://facebook.com/story.php?story_fbid=pfbid02WVPVEeJ7jDRMr4WwSSMMb7PRqNfqqBJZs4bB4aBJ3XvgMiTxrsHKZVTaBqukkJ8Ll&id=100064368354094', 'link': None, 'links': [{'link': '/story.php?story_fbid=pfbid02WVPVEeJ7jDRMr4WwSSMMb7PRqNfqqBJZs4bB4aBJ3XvgMiTxrsHKZVTaBqukkJ8Ll&id=100064368354094&eav=AfaFP0sp3WXHigA2Xieq7BQRoCEIqyLRMXWpQRoCfxAevayQPgZCN95cnUlonILq7_c&m_entstream_source=timeline&refid=17&paipv=0', 'text': ''}, {'link': 'https://mbasic.facebook.com/photo.php?fbid=739497278205877&id=100064368354094&set=a.604604228361850&eav=AfbOXv5LyoT2YNADccGFlF2NO0mw-g-5D5xLOYto3MKnM00-P2sDakgyz1cYmlZ4r8c&paipv=0&source=48&refid=17', 'text': ''}], 'user_id': None, 'username': 'Nintendo of America', 'user_url': 'https://facebook.com/NintendoAmerica/?lst=61553879472338%3A100064368354094%3A1702151109&eav=AfZnk4h6lYaEnb7oa3qnX-3rgBwgbXkiWxWlUyk2m-bjAsGyQZfZLSNaguhIqY6QBZ0&refid=17&paipv=0', 'is_live': False, 'factcheck': None, 'shared_post_id': None, 'shared_time': None, 'shared_user_id': None, 'shared_username': None, 'shared_user_url': None, 'shared_post_url': None, 'available': True, 'comments_full': None, 'reactors': None, 'w3_fb_url': None, 'reactions': None, 'reaction_count': 10000, 'with': None, 'page_id': None, 'sharers': None, 'translated_text': '', 'image_id': '739497278205877', 'image_ids': ['739497278205877'], 'was_live': False}
The Super Smash Bros. amiibo of Kingdom Hearts’ So
{'post_id': None, 'text': 'The Super Smash Bros. amiibo of Kingdom Hearts’ Sora will be released on February 16th 2024!', 'post_text': 'The Super Smash Bros. amiibo of Kingdom Hearts’ Sora will be released on February 16th 2024!', 'shared_text': '', 'original_text': None, 'time': datetime.datetime(2023, 12, 6, 17, 7), 'timestamp': None, 'image': None, 'image_lowquality': 'https://scontent-arn2-1.xx.fbcdn.net/v/t15.5256-10/408311684_322940160667403_2458471000675836479_n.jpg?stp=cp0_dst-jpg_e15_p720x720_q65&_nc_cat=104&ccb=1-7&_nc_sid=f3b36a&efg=eyJpIjoiYiJ9&_nc_ohc=N71w-AFx96UAX_aKd6Z&tn=7WMAHMjd06AW0FGV&_nc_ht=scontent-arn2-1.xx&oh=00_AfBYaih_YakmXASj3Ad_g-eYNfMDEogie2Vkj60hhTmv_g&oe=65792611', 'images': [], 'images_description': [], 'images_lowquality': ['https://scontent-arn2-1.xx.fbcdn.net/v/t15.5256-10/408311684_322940160667403_2458471000675836479_n.jpg?stp=cp0_dst-jpg_e15_p720x720_q65&_nc_cat=104&ccb=1-7&_nc_sid=f3b36a&efg=eyJpIjoiYiJ9&_nc_ohc=N71w-AFx96UAX_aKd6Z&tn=7WMAHMjd06AW0FGV&_nc_ht=scontent-arn2-1.xx&oh=00_AfBYaih_YakmXASj3Ad_g-eYNfMDEogie2Vkj60hhTmv_g&oe=65792611'], 'images_lowquality_description': [None], 'video': 'https://scontent-arn2-1.xx.fbcdn.net/v/t42.1790-2/407540787_382017990845112_4737937475628955811_n.mp4?_nc_cat=105&ccb=1-7&_nc_sid=55d0d3&efg=eyJybHIiOjY2MiwicmxhIjo1MTIsInZlbmNvZGVfdGFnIjoic3ZlX3NkIn0%3D&_nc_ohc=xB5Xt9_W-8oAX_OY3Te&_nc_rml=0&_nc_ht=scontent-arn2-1.xx&oh=00_AfCsAoRLuk244UQGHqrlANW-kAlY4kJf1smqBgrLeiiAFg&oe=65793E2D', 'video_duration_seconds': None, 'video_height': None, 'video_id': '709312260916437', 'video_quality': None, 'video_size_MB': None, 'video_thumbnail': None, 'video_watches': None, 'video_width': None, 'likes': 2600, 'comments': 279, 'shares': 754, 'post_url': 'https://facebook.com/story.php?story_fbid=pfbid01GEJp4CX1PGoqXNZponB2wFqQviS86NS7dwHzRbb8Zhghu1CxYJKt3XoKbWSachpl&id=100064368354094', 'link': None, 'links': [{'link': '/story.php?story_fbid=pfbid01GEJp4CX1PGoqXNZponB2wFqQviS86NS7dwHzRbb8Zhghu1CxYJKt3XoKbWSachpl&id=100064368354094&eav=Afb2UR5DHWHROPOE9ku0ltfM7c4hmYogVKFN_EEGRrctG47_ut8XIHLz26f5BTavCL4&m_entstream_source=timeline&refid=17&paipv=0', 'text': ''}], 'user_id': None, 'username': 'Nintendo of America', 'user_url': 'https://facebook.com/NintendoAmerica/?lst=61553879472338%3A100064368354094%3A1702151109&eav=AfZnk4h6lYaEnb7oa3qnX-3rgBwgbXkiWxWlUyk2m-bjAsGyQZfZLSNaguhIqY6QBZ0&refid=17&paipv=0', 'is_live': False, 'factcheck': None, 'shared_post_id': None, 'shared_time': None, 'shared_user_id': None, 'shared_username': None, 'shared_user_url': None, 'shared_post_url': None, 'available': True, 'comments_full': None, 'reactors': None, 'w3_fb_url': None, 'reactions': None, 'reaction_count': 2600, 'with': None, 'page_id': None, 'sharers': None, 'translated_text': '', 'image_id': None, 'image_ids': [], 'was_live': False}

post_id still unavailable

moda20 commented 10 months ago

@streamcon i tried it now and it worked, but i did find an issue where using nintendo would automatically redirect to NintendoAmerica which might not return the right result, try again with 'NintendoAmerica' instead of just 'nintendo'. Otherwise i will ask you to get new cookies

chelishchev commented 10 months ago

@moda20 I run example:

    for post in scraper.get_posts('NintendoAmerica', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/NintendoAmerica?v=timeline", pages=1):
        print(post)

And I didn't get anything. Use the last version from repo.

moda20 commented 9 months ago

@chelishchev Please chekc your cookies and try other pages to see if it's still an issue,

moda20 commented 9 months ago

@chelishchev I am closing this issue as it seems to be fixed