kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.29k stars 616 forks source link

How i get all image in post. Now get only 4 images #161

Closed kawewutchu closed 3 years ago

kawewutchu commented 3 years ago

@kevinzg Can I help you fix this. and how possible I want to download video from a Facebook post.

neon-ninja commented 3 years ago

Can you provide a link to the problematic post? Also note that you can use youtube-dl to download videos, see https://github.com/kevinzg/facebook-scraper/blob/master/README.md

gaoyunzhi commented 3 years ago

I met this issue also, only the first 4 images are in the post['images'].

Example url: https://www.facebook.com/transarmy/posts/308391180656067

Thank you for making this project!

neon-ninja commented 3 years ago

@gaoyunzhi Thanks - this commit should fix the problem - https://github.com/kevinzg/facebook-scraper/commit/7f2305efe67c43b62accd4dfe177e64abd92051e

Sample code:

from facebook_scraper import get_posts
posts = list(get_posts(post_urls=["https://www.facebook.com/transarmy/posts/308391180656067"]))
pprint.pprint(posts)

Output:

[{'available': True,
  'comments': 0,
  'comments_full': None,
  'factcheck': None,
  'image': 'https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179356904_308390967322755_4634698283808243419_n.jpg?_nc_cat=103&ccb=1-3&_nc_sid=110474&_nc_ohc=9E4Q0VqpZPUAX_WcZWg&_nc_ht=scontent.fakl1-3.fna&tp=14&oh=008457c8e356305f25f41250d5e6d840&oe=60AE0393',
  'image_lowquality': 'https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-0/cp0/e15/q65/p160x160/179356904_308390967322755_4634698283808243419_n.jpg?_nc_cat=103&ccb=1-3&_nc_sid=110474&_nc_ohc=9E4Q0VqpZPUAX_WcZWg&_nc_ht=scontent.fakl1-3.fna&tp=3&oh=b53d51bdbb81fb5abf492a5c8e25f3aa&oe=60B01FD1',
  'images': ['https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179356904_308390967322755_4634698283808243419_n.jpg?_nc_cat=103&ccb=1-3&_nc_sid=110474&_nc_ohc=9E4Q0VqpZPUAX_WcZWg&_nc_ht=scontent.fakl1-3.fna&tp=14&oh=008457c8e356305f25f41250d5e6d840&oe=60AE0393',
             'https://scontent.fakl1-2.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179517508_308390997322752_5350238678322121013_n.jpg?_nc_cat=106&ccb=1-3&_nc_sid=110474&_nc_ohc=_OUXdJcYTssAX_t6DDy&_nc_ht=scontent.fakl1-2.fna&tp=14&oh=c9d8a0c7ecff655d08e72da888433435&oe=60B19056',
             'https://scontent.fakl1-2.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179204069_308391017322750_1591319436636517060_n.jpg?_nc_cat=101&ccb=1-3&_nc_sid=110474&_nc_ohc=pVR8RxiNgJsAX8IQraQ&_nc_ht=scontent.fakl1-2.fna&tp=14&oh=f3d0163cbce3a063c0baf9bacf62aed8&oe=60AED418',
             'https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179874180_308391050656080_2923872542835464768_n.jpg?_nc_cat=108&ccb=1-3&_nc_sid=110474&_nc_ohc=PzzeFzWXk-0AX-E1mzS&_nc_ht=scontent.fakl1-3.fna&tp=14&oh=86c3e9a15615133f3ec6fae9ed718c8f&oe=60B1617E',
             'https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179542228_308391077322744_4234729016546928035_n.jpg?_nc_cat=103&ccb=1-3&_nc_sid=110474&_nc_ohc=-j8AwEUxlsIAX9xzDs0&_nc_oc=AQn-uQOo7i9fDM76xybKdx4j-B8eFuP5XD6Pdqq0FDbUt-ji3xfoCFWFJd8HDQsyYco&_nc_ht=scontent.fakl1-3.fna&tp=14&oh=2126a56201186b3d60ed1eccdfa255e6&oe=60AF5472',
             'https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179988558_308391097322742_1154086016853607266_n.jpg?_nc_cat=105&ccb=1-3&_nc_sid=110474&_nc_ohc=lPpC1jRlcMsAX894oUC&_nc_ht=scontent.fakl1-3.fna&tp=14&oh=fc3a26f4f8055e5f3d2e55b0b54d1afe&oe=60ADB70E',
             'https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/178417600_308391130656072_8351145921343114682_n.jpg?_nc_cat=111&ccb=1-3&_nc_sid=110474&_nc_ohc=5Ymw3X4y8ZoAX_tctZc&_nc_ht=scontent.fakl1-3.fna&tp=14&oh=f4c20529b10c88ab09e4c0d51953d543&oe=60B0EE45',
             'https://scontent.fakl1-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/179356904_308390967322755_4634698283808243419_n.jpg?_nc_cat=103&ccb=1-3&_nc_sid=110474&_nc_ohc=9E4Q0VqpZPUAX_WcZWg&_nc_ht=scontent.fakl1-3.fna&tp=14&oh=008457c8e356305f25f41250d5e6d840&oe=60AE0393'],
  'is_live': False,
  'likes': 0,
  'link': None,
  'post_id': '308391180656067',
  'post_text': 'Selfcarevisuals @justgirlproject',
  'post_url': 'https://facebook.com/story.php?story_fbid=308391180656067&id=100179361477251',
  'reactors': None,
  'shared_post_id': None,
  'shared_post_url': None,
  'shared_text': '',
  'shared_time': None,
  'shared_user_id': None,
  'shared_username': None,
  'shares': 0,
  'text': 'Selfcarevisuals @justgirlproject',
  'time': datetime.datetime(2021, 4, 28, 23, 40, 22),
  'user_id': '100179361477251',
  'username': 'Trans Army',
  'video': None,
  'video_id': None,
  'video_thumbnail': None,
  'w3_fb_url': None}]
gaoyunzhi commented 3 years ago

Thank you for the update! I think now we are getting more photos, but the additional photo seems to be wrong.

example post: https://www.facebook.com/transarmy/posts/311640790331106

example result:

{
  'post_id': '311640790331106',
  'text': '@katylpacino',
  'post_text': '@katylpacino',
  'shared_text': '',
  'time': datetime.datetime(2021,
  5,
  3,
  7,
  30,
  7),
  'image': 'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182192252_311640423664476_4943039830878416363_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=110474&_nc_ohc=DXp6T2YDyDYAX8x6r3c&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=0c8ee0e1b7a2b8c517c0c96be25ba27e&oe=60B4EABB',
  'image_lowquality': 'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-0/cp0/e15/q65/s320x320/182192252_311640423664476_4943039830878416363_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=110474&_nc_ohc=DXp6T2YDyDYAX8x6r3c&_nc_ht=scontent-sjc3-1.xx&tp=9&oh=3c7bcfc8ccc8bf82cfc81c30cae43748&oe=60B73FC0',
  'images': [
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182192252_311640423664476_4943039830878416363_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=110474&_nc_ohc=DXp6T2YDyDYAX8x6r3c&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=0c8ee0e1b7a2b8c517c0c96be25ba27e&oe=60B4EABB',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182151209_311640486997803_5543981058982086722_n.jpg?_nc_cat=104&ccb=1-3&_nc_sid=110474&_nc_ohc=Ad6lYK3b5wMAX8AB5Ih&_nc_oc=AQkOKKDc_E5wdOg_GfCCo7HCJPsrCtMPXBCVzV9x0Wt1b_IYnZt1Dc30Yse2dekxHTE&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=1df37ebeded9d767137a5b0e16161810&oe=60B712DE',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182266479_311640516997800_1816134662371228066_n.jpg?_nc_cat=111&ccb=1-3&_nc_sid=110474&_nc_ohc=6zjy-aSu_0UAX_lGieN&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=1866029cf03338b5f085c0af966f1f38&oe=60B78DE4',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182023920_311640553664463_2357805205459150479_n.jpg?_nc_cat=102&ccb=1-3&_nc_sid=110474&_nc_ohc=qzhKPyWes0IAX9gCiZ6&_nc_oc=AQnOF2H-r8kxTyklP9b0zhAhSrgF_wzID9Hj43udU52_bbJRlyrAVbhJU5F3QihV6qI&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=64c790b7a1cb5720f8e9f3ce925ceb86&oe=60B6C0CD',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182266479_311640516997800_1816134662371228066_n.jpg?_nc_cat=111&ccb=1-3&_nc_sid=110474&_nc_ohc=6zjy-aSu_0UAX_lGieN&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=1866029cf03338b5f085c0af966f1f38&oe=60B78DE4',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182151209_311640486997803_5543981058982086722_n.jpg?_nc_cat=104&ccb=1-3&_nc_sid=110474&_nc_ohc=Ad6lYK3b5wMAX8AB5Ih&_nc_oc=AQkOKKDc_E5wdOg_GfCCo7HCJPsrCtMPXBCVzV9x0Wt1b_IYnZt1Dc30Yse2dekxHTE&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=1df37ebeded9d767137a5b0e16161810&oe=60B712DE',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182192252_311640423664476_4943039830878416363_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=110474&_nc_ohc=DXp6T2YDyDYAX8x6r3c&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=0c8ee0e1b7a2b8c517c0c96be25ba27e&oe=60B4EABB',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/182249958_311615340333651_5187974675392747973_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=110474&_nc_ohc=67gEevg231sAX9h1pWM&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=8f7dc214c3759fa8e8bc22fdf1ba7fd2&oe=60B668DB',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/181819438_311614363667082_5336170164909086666_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=110474&_nc_ohc=Huh_LGnnmbQAX_-24nh&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=592f975585670506b2300edc6c03c370&oe=60B76A4D',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/181951279_311598480335337_11845063156394839_n.jpg?_nc_cat=108&ccb=1-3&_nc_sid=110474&_nc_ohc=uI316FzqjVAAX9_VONf&_nc_ht=scontent-sjc3-1.xx&tp=14&oh=e860f219264e2452d7e412e71fcb47e2&oe=60B4970A'
  ],
  'images_lowquality': [
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-0/cp0/e15/q65/s320x320/182192252_311640423664476_4943039830878416363_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=110474&_nc_ohc=DXp6T2YDyDYAX8x6r3c&_nc_ht=scontent-sjc3-1.xx&tp=9&oh=3c7bcfc8ccc8bf82cfc81c30cae43748&oe=60B73FC0',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-0/cp0/e15/q65/p110x80/182151209_311640486997803_5543981058982086722_n.jpg?_nc_cat=104&ccb=1-3&_nc_sid=110474&_nc_ohc=Ad6lYK3b5wMAX8AB5Ih&_nc_oc=AQkOKKDc_E5wdOg_GfCCo7HCJPsrCtMPXBCVzV9x0Wt1b_IYnZt1Dc30Yse2dekxHTE&_nc_ht=scontent-sjc3-1.xx&tp=3&oh=7445c360b07422c3475373d8413bdb0a&oe=60B5527A',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-0/cp0/e15/q65/p110x80/182266479_311640516997800_1816134662371228066_n.jpg?_nc_cat=111&ccb=1-3&_nc_sid=110474&_nc_ohc=6zjy-aSu_0UAX_lGieN&_nc_ht=scontent-sjc3-1.xx&tp=3&oh=71e211541a321abbe6d0e061bbdb3c5e&oe=60B71E48',
    'https://scontent-sjc3-1.xx.fbcdn.net/v/t1.6435-0/cp0/e15/q65/p110x80/182023920_311640553664463_2357805205459150479_n.jpg?_nc_cat=102&ccb=1-3&_nc_sid=110474&_nc_ohc=qzhKPyWes0IAX9gCiZ6&_nc_oc=AQnOF2H-r8kxTyklP9b0zhAhSrgF_wzID9Hj43udU52_bbJRlyrAVbhJU5F3QihV6qI&_nc_ht=scontent-sjc3-1.xx&tp=3&oh=357489bf3b776f2cf01675de823a1ecf&oe=60B44EE9'
  ],
  'video': None,
  'video_thumbnail': None,
  'video_id': None,
  'likes': 0,
  'comments': 0,
  'shares': 209,
  'post_url': 'https://facebook.com/story.php?story_fbid=311640790331106&id=100179361477251',
  'link': None,
  'user_id': '100179361477251',
  'username': 'Trans Army',
  'user_url': 'https://facebook.com/transarmy/?__tn__=C-R',
  'is_live': False,
  'factcheck': None,
  'shared_post_id': None,
  'shared_time': None,
  'shared_user_id': None,
  'shared_username': None,
  'shared_post_url': None,
  'available': True,
  'comments_full': None,
  'reactors': None,
  'w3_fb_url': None
}
neon-ninja commented 3 years ago

Surfacing image alt text might be useful in debugging this, and also a useful feature

neon-ninja commented 3 years ago

@gaoyunzhi https://github.com/kevinzg/facebook-scraper/commit/871c762764e611a4f2dfc2345f79d88f81a84987 should fix this, give it a try

posts = list(get_posts(post_urls=["https://www.facebook.com/transarmy/posts/311640790331106"]))
for desc in posts[0]["images_description"]:
    print(desc)

Outputs:

May be an image of 1 person and text that says "HOWTOCTIVTE HOW TO CULTIVATE GENDER NEUTRAL SPACE FOR YOUR CHILDREN @katylpacino"
May be an image of one or more people and text that says "FIRST.... LET'S GET RID OF GENDER REVEAL PARTIES THESE PARTIES REFLECT HOW WE PERCEIVE GENDER ON A BINARY @katylpacino"
May be an image of one or more people and text that says "SECOND.... LET'S STOP GENDERING COLORS PINK IS NOT FOR GIRLS AND BLUE IS NOT FOR BOYS @katylpacino"
May be an image of one or more people and text that says "THIRD... LET'S STOP GENDERING TOYS BOYS CAN PLAY WITH DOLLS AND GIRLS CAN PLAY WITH ΤΟΥ TRUCKS @katylpacino @katylp"
May be an image of one or more people and text that says "FOURTH... USE NON- BINARY LANGUAGE YOUR KIDS PICK UP ON THE LANGUAGE YOU USE @katylpacino"
May be an image of 1 person and text that says "FIFTH... TALK ABOUT GENDER HAVE CONVERSATIONS WITH YOUR KIDS AND DISCUSS THEIR IDENTITY AND THE SPECTRUM OF GENDER. DO THIS FREQUENTLY @katylpacino"
May be an image of one or more people and text that says "SIXTH... BE FLEXIBLE WITH THEIR GENDER EXPRESSION AS KIDS LEARN ABOUT GENDER AND LEARN ABOUT THEMSEVLES IN THE WORLD, THEY MIGHT CHANGE HOW THEY IDENTIFY. THIS IS OKAY! BE FLEXIBLE WITH THEM AND RESPECT THEIR GENDER EXPRESSION @katylpacino"
May be an image of one or more people and text that says "REMEMBER... NO MATTER WHAT YOU DO, SOCIAL DISCOURSES EXIST YOU CAN DO ALL THE RIGHT THINGS AND YOUR KIDS WILL STILL LEARN ABOUT GENDER BINARY @katylpacino"
May be an image of one or more people and text that says "BUT... THE GOAL SHOULD BE TO CREATE A SAFE SPACE FOR YOUR CHILD THE MOST IMPORTANT THING FOR YOUR CHILD IS TO BE SAFE IN THEIR HOME AND KNOW THEY CAN TRUST THEIR PARENT/GUARDIAN @katylpacino"
enaserianhanzaei commented 3 years ago

@neon-ninja

Hi, thanks for the amazing library. However, i still have problems with scraping the image posts when there are more than 4,5 images (when you have to click to open all the images), do you have any idea how to fix that?

for example in this url: https://www.facebook.com/groups/1166154663436469, it only returns 5 images

neon-ninja commented 3 years ago

Duplicate comment of https://github.com/kevinzg/facebook-scraper/issues/231#issuecomment-845871874