shaikhsajid1111 / facebook_page_scraper

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
https://pypi.org/project/facebook-page-scraper/
MIT License
211 stars 62 forks source link

Get Comments content #2

Open marcomameli1992 opened 3 years ago

marcomameli1992 commented 3 years ago

Dear, There is the possibility to get the content of the comments of the posts? Because in your code you only get the number of comment (I have to change the id of the list to 1 instead of 0 in the method that gets this number) but I would like to know if from post is possible to extract the text and reaction of a comment. You have some advice?

shaikhsajid1111 commented 3 years ago

You want the output of the comment's replies as well?.

If you don't want replies text then you can make a simple request to the post's URL, and the response will contain JavaScript Object inside <script> tag which will contain object for each comment, it almost has all the data you mentioned.

I'm pasting an object for a single comment below.

{
    "node": {
        "id": "Y29tbWVudDo4MjcwMzgzMzQzNzQ3NDNfODg0Nzk0ODQ1MjY1NzU4",
        "legacy_fbid": "884794845265758",
        "author": {
            "__typename": "User",
            "id": "100006874397694",
            "name": "S\u00ea\u00f1haj Aziz Senhajii",
            "__isActor": "User",
            "profile_picture_depth_0": {
                "uri": "https:\/\/scontent-bom1-1.xx.fbcdn.net\/v\/t1.0-1\/cp0\/c5.0.32.32a\/p32x32\/145176629_2803474619891657_7543331979447625897_o.jpg?_nc_cat=101&ccb=2&_nc_sid=7206a8&_nc_ohc=QZQMvNTFI0kAX-gLZfj&_nc_ht=scontent-bom1-1.xx&tp=27&oh=ed36f326be715270d169e14b2df81123&oe=603F00BC"
            },
            "profile_picture_depth_1": {
                "uri": "https:\/\/scontent-bom1-1.xx.fbcdn.net\/v\/t1.0-1\/cp0\/c4.0.24.24a\/p24x24\/145176629_2803474619891657_7543331979447625897_o.jpg?_nc_cat=101&ccb=2&_nc_sid=7206a8&_nc_ohc=QZQMvNTFI0kAX-gLZfj&_nc_ht=scontent-bom1-1.xx&tp=27&oh=45788eb7e6835ac99445d2d1e1705acf&oe=60401989"
            },
            "gender": "MALE",
            "__isEntity": "User",
            "url": "https:\/\/www.facebook.com\/people\/S\u0025C3\u0025AA\u0025C3\u0025B1haj-Aziz-Senhajii\/100006874397694",
            "work_info": null,
            "is_verified": false,
            "short_name": "Senhajii"
        },
        "is_author_weak_reference": false,
        "created_time": 1587155108,
        "spam_display_mode": "none",
        "attachments": [],
        "comment_menu_tooltip": null,
        "should_show_comment_menu": false,
        "private_reply_context": null,
        "feedback": {
            "id": "ZmVlZGJhY2s6ODI3MDM4MzM0Mzc0NzQzXzg4NDc5NDg0NTI2NTc1OA==",
            "page_private_reply": null,
            "viewer_actor": null,
            "viewer_feedback_reaction_info": null,
            "supported_reactions": [
                {
                    "key": 1
                },
                {
                    "key": 2
                },
                {
                    "key": 4
                },
                {
                    "key": 3
                },
                {
                    "key": 7
                },
                {
                    "key": 8
                }
            ],
            "associated_video": null,
            "top_reactions": {
                "edges": [
                    {
                        "reaction_count": 4,
                        "node": {
                            "key": 1,
                            "id": "1635855486666999",
                            "reaction_type": "LIKE"
                        }
                    },
                    {
                        "reaction_count": 1,
                        "node": {
                            "key": 2,
                            "id": "1678524932434102",
                            "reaction_type": "LOVE"
                        }
                    }
                ]
            },
            "reactors": {
                "count": 5,
                "is_empty": false,
                "count_reduced": "\u096b"
            },
            "can_viewer_comment": false,
            "can_viewer_react": false,
            "comment_composer_placeholder": "\u092a\u094d\u0930\u0924\u094d\u0924\u094d\u092f\u0941\u0924\u094d\u0924\u0930 \u0932\u093f\u0939\u093e...",
            "public_conversations_context": {
                "comment_vote_ui_version": "NONE"
            },
            "should_show_top_reactions": true,
            "ask_me_anything_feedback_metadata": null,
            "comment_count": {
                "total_count": 6
            },
            "toplevel_comment_count": {
                "count": 6
            },
            "threading_config": {
                "__typename": "NoThreadingFeedbackConfig"
            },
            "can_viewer_pin_live_comments": false,
            "latest_pinned_comment_event": null,
            "work_answer_event_action_links_comment_renderer": null,
            "subscription_target_id": "884794845265758",
            "display_comments": {
                "highlighted_comments": [],
                "comment_order": "RANKED_REPLIES",
                "expanded_sub_reply_parents": [],
                "is_initially_expanded": false,
                "page_size": 50,
                "reply_comment_order": "RANKED_REPLIES",
                "should_render_composer_preemptively": false,
                "after_count": 6,
                "before_count": 0,
                "count": 6,
                "edges": [],
                "page_info": {
                    "end_cursor": null,
                    "has_next_page": true,
                    "has_previous_page": false,
                    "start_cursor": null
                }
            },
            "associated_group": null
        },
        "upvote_downvote_total": 0,
        "viewer_comment_vote_state": "NONE",
        "work_ama_answer_status": null,
        "page_admin_actor_info": null,
        "is_author_banned_by_content_owner": false,
        "can_viewer_upvote_downvote": false,
        "comment_parent": null,
        "edit_history": {
            "count": 0
        },
        "parent_feedback": {
            "can_viewer_ban_user": false,
            "can_viewer_comment": false,
            "viewer_acts_as_page": null,
            "id": "ZmVlZGJhY2s6ODI3MDM4MzM0Mzc0NzQz",
            "share_fbid": "827038334374743",
            "political_figure_data": null
        },
        "ban_action": "BAN",
        "preferred_body": {
            "__typename": "TextWithEntities",
            "translation_type": "ORIGINAL",
            "delight_ranges": [],
            "image_ranges": [],
            "inline_style_ranges": [],
            "aggregated_ranges": [],
            "ranges": [],
            "color_ranges": [],
            "text": "In have problem in my whatsapp business "
        },
        "translatability_for_viewer": {
            "source_dialect_name": "\u0907\u0902\u0917\u094d\u0930\u091c\u0940"
        },
        "translation_available_for_viewer": false,
        "url": "https:\/\/www.facebook.com\/Whatsappforbusiness\/posts\/827038334374743?comment_id=884794845265758",
        "is_hidden_by_content_owner": null,
        "if_viewer_can_share": null,
        "body_renderer": {
            "__typename": "TextWithEntities",
            "delight_ranges": [],
            "image_ranges": [],
            "inline_style_ranges": [],
            "aggregated_ranges": [],
            "ranges": [],
            "color_ranges": [],
            "text": "In have problem in my whatsapp business ",
            "__module_operation_CometUFICommentBody_comment": {
                "__dr": "CometUFICommentBodyTextWithEntities_textWithEntities$normalization.graphql"
            },
            "__module_component_CometUFICommentBody_comment": {
                "__dr": "CometUFICommentBodyTextWithEntities.react"
            }
        },
        "timestamp_in_video": null,
        "written_while_video_was_live": false,
        "group_comment_info": null,
        "has_constituent_badge": false,
        "can_see_constituent_badge_upsell": false,
        "legacy_token": "827038334374743_884794845265758",
        "question_and_answer_type": null,
        "is_author_original_poster": false,
        "is_viewer_comment_poster": false,
        "is_author_bot": false,
        "is_author_non_coworker": false,
        "author_user_signals_renderer": null,
        "author_badge_renderers": [],
        "identity_badges_web": [],
        "can_show_multiple_identity_badges": false,
        "earned_identity_badges_web": [],
        "can_viewer_disable_preview": false,
        "inline_survey_config": null,
        "attached_story": null,
        "work_answered_event_comment_renderer": null,
        "elevated_comment_data": null,
        "body": {
            "text": "In have problem in my whatsapp business ",
            "ranges": []
        },
        "is_markdown_enabled": false,
        "reply_parent_comment": null,
        "threading_depth": 0,
        "__typename": "Comment"
    },
    "cursor": "AQHR8HGfro-bIrLKBXuaovTl2aow5mWfhIyavvxB3qQwAOyV0sKqR_YsgvBBjw_gOjUmSVU1YCHcKHrGGGQBX6PTyA"
}

.

A simple request, response method may not work if comments are over 100 or so as comments are hidden(loads on events only) and injected using JavaScript, so you can use selenium for this, looping and clicking till you find the element that says "View More Comments". And when you reach the bottom of the page, use driver.page_source to extract the entire source code and find the comment's data.


If you want replies as well, you probably will have to open that post's URL with selenium,

  1. Close the "Sign Up" modal
  2. Scroll Down
  3. Check if the comment has a reply, if yes click the replies button continuously till you find "View More"
  4. Extract data from that comment, store it in some data structure(just like I have used python's dictionary on my project)
  5. Repeat, procedure 2-4 until comment exists

There are more different ways as well.

marcomameli1992 commented 3 years ago

Dear, I'm sorry but I do not understand how to get information from the javascript with beautiful soap. Can you give me some example of code or something to learn how to do it. Thank you.

shaikhsajid1111 commented 3 years ago

Are you using request or selenium?. Is it must for you to get replies as well?

valerio1805 commented 3 years ago

Hi, i'm also interested to understand how get comments from a post. Can you public the code you used to obtain the object in the second post? I'm making a simple request to the URL post and then using BeautifulSoup, can you help me to get the comment of a particular post?

shaikhsajid1111 commented 3 years ago

@valerio1805 If you're making the request and using that response to get out the comments, you'll only get a few of them.

For e.g, Suppose a post has 1000 comments and you send a request, you will only get around the first 10 comments without their replies as a response because the other 990 comments were never loaded as it is based on javascript's onClick() event.

Facebook uses a framework similar to react.js(only they know which one genuinely) that injects codes via javascript which makes it extremely hard to scrap without browser interaction. So, using BeautifulSoup won't help you get the comments, you have to use frameworks like puppeteer(for javascript) or selenium(for python).

valerio1805 commented 3 years ago

thanks, I will try with selenium to extract comments from a post