kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.37k stars 626 forks source link

How can I return an XML file #369

Closed bullride closed 3 years ago

bullride commented 3 years ago

Hi there

I'm new to python but already in love with it. Thanks for the maintainers of this very handy code. Everything works for me but how can I display an XML result in the browser.

neon-ninja commented 3 years ago

Why XML?

bullride commented 3 years ago

Hi @neon-ninja

It don't have to be XML - it can be anything or JSON but I can't seem to figure out how to display the result in a browser. I only see the result in the terminal / console.

Sorry man, I very very new to Python and can't think straight.

Thanks

neon-ninja commented 3 years ago

Here's how you can write posts to JSON:

from facebook_scraper import *
import json

with open("Nintendo.json", "w") as f:
    posts = list(get_posts("Nintendo", pages=2, options={"allow_extra_requests": False}))
    json.dump(posts, f, indent=4, default=str)

You can then open the resulting Nintendo.json file with your preferred text editor or browser. The pprint library might also be useful.

bullride commented 3 years ago

Hi @neon-ninja

This is what I get when I run this code. Cookies saved as per README using the EditmyCookie Google Chrome extension. Ive got all the cookies saved in the file - or should I only save the c_user cookie and the xs cookie?

from facebook_scraper import *
import json

with open("Nintendo.json", "w") as f:
    posts = list(get_posts("Nintendo", pages=2, cookies="cookies.txt", options={"allow_extra_requests": False}))
    json.dump(posts, f, indent=4, default=str)
Traceback (most recent call last):
  File "C:/Users/NARI-SSA/PycharmProjects/Feedback_App/app.py", line 5, in <module>
    posts = list(get_posts("Nintendo", pages=2, cookies="cookies.txt", options={"allow_extra_requests": False}))
  File "C:\Users\NARI-SSA\PycharmProjects\Feedback_App\facebook_scraper\__init__.py", line 131, in get_posts
    set_cookies(cookies)
  File "C:\Users\NARI-SSA\PycharmProjects\Feedback_App\facebook_scraper\__init__.py", line 24, in set_cookies
    cookies = parse_cookie_file(cookies)
  File "C:\Users\NARI-SSA\PycharmProjects\Feedback_App\facebook_scraper\utils.py", line 242, in parse_cookie_file
    domain, _, path, secure, expires, name, value = line.split('\t')
ValueError: not enough values to unpack (expected 7, got 1)

Process finished with exit code 1
neon-ninja commented 3 years ago

Your cookies aren't in Netscape format. If you've exported cookies in JSON format, make sure the file has a .json extension instead of .txt.

neon-ninja commented 3 years ago

https://github.com/kevinzg/facebook-scraper/commit/c8e8f907956183a73204e1aeb77f6c23d6a6a1c1 should make it possible to pass JSON cookies without a .json extension.

bullride commented 3 years ago

I've changed it to cookies.json - And got this response

sys:1: UserWarning: A low page limit (<=2) might return no results, try increasing the limit
Process finished with exit code 0

[
    {
        "post_id": "4244168652334222",
        "text": "Get ready for 2-player adventures! Connect LEGO Mario and LEGO Luigi, defeat enemies, take on challenges as a team, share rewards and earn extra coins by playing together! Available August 1st.\n\nExplore the LEGO Super Mario collection: http://ninten.do/6180nKBUE",
        "post_text": "Get ready for 2-player adventures! Connect LEGO Mario and LEGO Luigi, defeat enemies, take on challenges as a team, share rewards and earn extra coins by playing together! Available August 1st.\n\nExplore the LEGO Super Mario collection: http://ninten.do/6180nKBUE",
        "shared_text": "",
        "time": "2021-06-22 04:01:00",
        "image": null,
        "image_lowquality": null,
        "images": null,
        "images_description": null,
        "images_lowquality": [],
        "images_lowquality_description": [],
        "video": "https://video-cpt1-1.xx.fbcdn.net/v/t42.1790-2/205402340_493906205010618_6809493299846516972_n.mp4?_nc_cat=105&ccb=1-3&_nc_sid=985c63&efg=eyJybHIiOjUwNCwicmxhIjo1MTIsInZlbmNvZGVfdGFnIjoic3ZlX3NkIn0%3D&_nc_ohc=XN0zacrQQygAX-6ggST&_nc_rml=0&_nc_ht=video-cpt1-1.xx&oh=301d9ea401bae79f49e830833414a1d5&oe=60D554FD",
        "video_duration_seconds": null,
        "video_height": null,
        "video_id": "156333709821754",
        "video_quality": null,
        "video_size_MB": null,
        "video_thumbnail": "https://scontent-cpt1-1.xx.fbcdn.net/v/t15.5256-10/cp0/e15/q65/s1080x2048/189910994_156334449821680_3369509636692629041_n.jpg?_nc_cat=106&ccb=1-3&_nc_sid=ccf8b3&efg=eyJpIjoidCJ9&_nc_ohc=xWorh5SD_X8AX_R7rTv&_nc_ht=scontent-cpt1-1.xx&tp=9&oh=740205f206fc4efa0a1fc971e5b497ef&oe=60D99E69",
        "video_watches": null,
        "video_width": null,
        "likes": 1700,
        "comments": 218,
        "shares": 375,
        "post_url": "https://facebook.com/Nintendo/posts/4244168652334222",
        "link": "http://ninten.do/6180nKBUE?fbclid=IwAR2tbYamcFN-RspWeN02rSLetlMQcnF41VgekxYLF3QEGFBB1PGtS0y3JiQ",
        "user_id": "119240841493711",
        "username": "Nintendo",
        "user_url": "https://facebook.com/Nintendo/?__tn__=C-R",
        "is_live": false,
        "factcheck": null,
        "shared_post_id": null,
        "shared_time": null,
        "shared_user_id": null,
        "shared_username": null,
        "shared_post_url": null,
        "available": true,
        "comments_full": null,
        "reactors": null,
        "w3_fb_url": null,
        "reactions": null,
        "reaction_count": null
    },
    {
        "post_id": "4239326416151779",
        "text": "Wishing blue skies and smooth greens for all the fathers out there! Happy Father's Day!",
        "post_text": "Wishing blue skies and smooth greens for all the fathers out there! Happy Father's Day!",
        "shared_text": "",
        "time": "2021-06-20 09:00:00",
        "image": null,
        "image_lowquality": null,
        "images": null,
        "images_description": null,
        "images_lowquality": [],
        "images_lowquality_description": [],
        "video": "https://video-cpt1-1.xx.fbcdn.net/v/t42.1790-2/204898679_157640729632322_6271991302142756485_n.mp4?_nc_cat=106&ccb=1-3&_nc_sid=985c63&efg=eyJybHIiOjU3NCwicmxhIjo1MTIsInZlbmNvZGVfdGFnIjoic3ZlX3NkIn0%3D&_nc_ohc=yDMMnxsW1FwAX_H2bYl&tn=bjIflVJg-63MwZyv&_nc_rml=0&_nc_ht=video-cpt1-1.xx&oh=212d43afd886dddc4a5f52befe543913&oe=60D54967",
        "video_duration_seconds": null,
        "video_height": null,
        "video_id": "1575720116152707",
        "video_quality": null,
        "video_size_MB": null,
        "video_thumbnail": "https://scontent-cpt1-1.xx.fbcdn.net/v/t15.5256-10/cp0/e15/q65/s1080x2048/198795149_1575720546152664_3333391678012831209_n.jpg?_nc_cat=111&ccb=1-3&_nc_sid=ccf8b3&efg=eyJpIjoidCJ9&_nc_ohc=HZOkReIYmRwAX8V339m&tn=bjIflVJg-63MwZyv&_nc_ht=scontent-cpt1-1.xx&tp=9&oh=6de46a0c7394ba5e9a3d4f0507e8f92a&oe=60DA261F",
        "video_watches": null,
        "video_width": null,
        "likes": 1700,
        "comments": 123,
        "shares": 123,
        "post_url": "https://facebook.com/Nintendo/posts/4239326416151779",
        "link": null,
        "user_id": "119240841493711",
        "username": "Nintendo",
        "user_url": "https://facebook.com/Nintendo/?__tn__=C-R",
        "is_live": false,
        "factcheck": null,
        "shared_post_id": null,
        "shared_time": null,
        "shared_user_id": null,
        "shared_username": null,
        "shared_post_url": null,
        "available": true,
        "comments_full": null,
        "reactors": null,
        "w3_fb_url": null,
        "reactions": null,
        "reaction_count": null
    },
    {
        "post_id": "4230186287065792",
        "text": "",
        "post_text": "",
        "shared_text": null,
        "time": "2021-06-17 01:20:00",
        "image": null,
        "image_lowquality": "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/202131416_4230186293732458_4313263390076405616_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=dd9801&efg=eyJpIjoidCJ9&_nc_ohc=jJV4Y76zMf0AX-5Vjb-&tn=bjIflVJg-63MwZyv&_nc_ht=scontent-cpt1-1.xx&tp=14&oh=a5621f62d5f71c10e59233f63565ef48&oe=60DA7E8A",
        "images": null,
        "images_description": null,
        "images_lowquality": [
            "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/202131416_4230186293732458_4313263390076405616_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=dd9801&efg=eyJpIjoidCJ9&_nc_ohc=jJV4Y76zMf0AX-5Vjb-&tn=bjIflVJg-63MwZyv&_nc_ht=scontent-cpt1-1.xx&tp=14&oh=a5621f62d5f71c10e59233f63565ef48&oe=60DA7E8A"
        ],
        "images_lowquality_description": [
            "May be a cartoon of text that says '1 NINTENDO SWITCH. MARIOGOLF SUPERRUSH RUSH SUPER Available June 25th Mario Golf Nintendo EVERTONE Nintendo CAMELOT are rademarks Nintendo 2021 Nintendo. Mild Cartoon Violence SRB'"
        ],
        "video": null,
        "video_duration_seconds": null,
        "video_height": null,
        "video_id": null,
        "video_quality": null,
        "video_size_MB": null,
        "video_thumbnail": null,
        "video_watches": null,
        "video_width": null,
        "likes": 1100,
        "comments": 114,
        "shares": 40,
        "post_url": "https://facebook.com/Nintendo/posts/4230186287065792",
        "link": null,
        "user_id": "119240841493711",
        "username": "Nintendo",
        "user_url": "https://facebook.com/Nintendo/?__tn__=C-R",
        "is_live": false,
        "factcheck": null,
        "shared_post_id": null,
        "shared_time": null,
        "shared_user_id": null,
        "shared_username": null,
        "shared_post_url": null,
        "available": true,
        "comments_full": null,
        "reactors": null,
        "w3_fb_url": null,
        "reactions": null,
        "reaction_count": null
    },
    {
        "post_id": "4226613690756385",
        "text": "That\u2019s a wrap on Nintendo at E3 2021! Thanks to all who tuned in to Nintendo Direct and Nintendo Treehouse: Live.\n\nToday, we showed a sampling of what\u2019s coming to Nintendo Switch, and we look forward to sharing more games with you all in the future.\n\nhttp://ninten.do/6184n1Qx0",
        "post_text": "That\u2019s a wrap on Nintendo at E3 2021! Thanks to all who tuned in to Nintendo Direct and Nintendo Treehouse: Live.\n\nToday, we showed a sampling of what\u2019s coming to Nintendo Switch, and we look forward to sharing more games with you all in the future.\n\nhttp://ninten.do/6184n1Qx0",
        "shared_text": "",
        "time": "2021-06-15 18:00:00",
        "image": null,
        "image_lowquality": "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-0/p640x640/201162472_4226613694089718_2653394856173505657_n.png?_nc_cat=105&ccb=1-3&_nc_sid=2d5d41&efg=eyJpIjoidCJ9&_nc_ohc=Gkn_yowNpj8AX_2_Hgs&_nc_ht=scontent-cpt1-1.xx&tp=30&oh=edee9fd7ae622e6457eae27e8a2de77a&oe=60D9CB10",
        "images": null,
        "images_description": null,
        "images_lowquality": [
            "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-0/p640x640/201162472_4226613694089718_2653394856173505657_n.png?_nc_cat=105&ccb=1-3&_nc_sid=2d5d41&efg=eyJpIjoidCJ9&_nc_ohc=Gkn_yowNpj8AX_2_Hgs&_nc_ht=scontent-cpt1-1.xx&tp=30&oh=edee9fd7ae622e6457eae27e8a2de77a&oe=60D9CB10"
        ],
        "images_lowquality_description": [
            "May be an image of text that says 'Nintendo E3 2021 Nintendo Direct E3 2021 + Nintendo TREEHOUSE LIVE E3 2021 Thank you fo watching'"
        ],
        "video": null,
        "video_duration_seconds": null,
        "video_height": null,
        "video_id": null,
        "video_quality": null,
        "video_size_MB": null,
        "video_thumbnail": null,
        "video_watches": null,
        "video_width": null,
        "likes": 2200,
        "comments": 605,
        "shares": 75,
        "post_url": "https://facebook.com/Nintendo/posts/4226613690756385",
        "link": "http://ninten.do/6184n1Qx0?fbclid=IwAR12m_-sllYRKiYEndz_2OnvgNteOPdmttNY-ia-RUcBStIYCn2fPvwi8kI",
        "user_id": "119240841493711",
        "username": "Nintendo",
        "user_url": "https://facebook.com/Nintendo/?__tn__=C-R",
        "is_live": false,
        "factcheck": null,
        "shared_post_id": null,
        "shared_time": null,
        "shared_user_id": null,
        "shared_username": null,
        "shared_post_url": null,
        "available": true,
        "comments_full": null,
        "reactors": null,
        "w3_fb_url": null,
        "reactions": null,
        "reaction_count": null
    },
    {
        "post_id": "4226242920793462",
        "text": "Nintendo of America President Doug Bowser joins Nintendo Power Podcast to discuss all the big E3 2021 Nintendo news! Tune in for talk on Metroid Dread, Advance Wars 1+2: Re-Boot Camp, the sequel to The Legend of Zelda: Breath of the Wild & more!\n\nAvailable now: http://ninten.do/6012VlOoQ",
        "post_text": "Nintendo of America President Doug Bowser joins Nintendo Power Podcast to discuss all the big E3 2021 Nintendo news! Tune in for talk on Metroid Dread, Advance Wars 1+2: Re-Boot Camp, the sequel to The Legend of Zelda: Breath of the Wild & more!\n\nAvailable now: http://ninten.do/6012VlOoQ",
        "shared_text": "",
        "time": "2021-06-15 14:25:00",
        "image": null,
        "image_lowquality": "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-0/cp0/e15/q65/p640x640/201067933_4226242924126795_2700063604734238274_n.jpg?_nc_cat=105&ccb=1-3&_nc_sid=2d5d41&efg=eyJpIjoidCJ9&_nc_ohc=tULClGGKhjEAX9A72bv&_nc_ht=scontent-cpt1-1.xx&tp=3&oh=6648ec150a48bc16b6740ccc4fd588b1&oe=60DAC53F",
        "images": null,
        "images_description": null,
        "images_lowquality": [
            "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-0/cp0/e15/q65/p640x640/201067933_4226242924126795_2700063604734238274_n.jpg?_nc_cat=105&ccb=1-3&_nc_sid=2d5d41&efg=eyJpIjoidCJ9&_nc_ohc=tULClGGKhjEAX9A72bv&_nc_ht=scontent-cpt1-1.xx&tp=3&oh=6648ec150a48bc16b6740ccc4fd588b1&oe=60DAC53F"
        ],
        "images_lowquality_description": [
            "May be an image of text"
        ],
        "video": null,
        "video_duration_seconds": null,
        "video_height": null,
        "video_id": null,
        "video_quality": null,
        "video_size_MB": null,
        "video_thumbnail": null,
        "video_watches": null,
        "video_width": null,
        "likes": 474,
        "comments": 79,
        "shares": 30,
        "post_url": "https://facebook.com/Nintendo/posts/4226242920793462",
        "link": "http://ninten.do/6012VlOoQ?fbclid=IwAR0BrS39Ex1U9ZWWQ2Co5cQ4SdQ_CGVZXzLmb2rs72Le0KjMJFe-CbtGmgY",
        "user_id": "119240841493711",
        "username": "Nintendo",
        "user_url": "https://facebook.com/Nintendo/?__tn__=C-R",
        "is_live": false,
        "factcheck": null,
        "shared_post_id": null,
        "shared_time": null,
        "shared_user_id": null,
        "shared_username": null,
        "shared_post_url": null,
        "available": true,
        "comments_full": null,
        "reactors": null,
        "w3_fb_url": null,
        "reactions": null,
        "reaction_count": null
    },
    {
        "post_id": "4225498680867886",
        "text": "Nintendo at E3 2021 starts in just 30 minutes! Sit back, relax, and enjoy the show!\n\nhttp://ninten.do/6182nEhVO",
        "post_text": "Nintendo at E3 2021 starts in just 30 minutes! Sit back, relax, and enjoy the show!\n\nhttp://ninten.do/6182nEhVO",
        "shared_text": "",
        "time": "2021-06-15 08:30:00",
        "image": null,
        "image_lowquality": "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/201016655_4225498690867885_4450198546391941772_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=2d5d41&efg=eyJpIjoidCJ9&_nc_ohc=f5qxakeuuEwAX9cHG2u&_nc_ht=scontent-cpt1-1.xx&tp=14&oh=40a63116924ec5fface1d5a3206f9111&oe=60DABBF9",
        "images": null,
        "images_description": null,
        "images_lowquality": [
            "https://scontent-cpt1-1.xx.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/201016655_4225498690867885_4450198546391941772_n.jpg?_nc_cat=107&ccb=1-3&_nc_sid=2d5d41&efg=eyJpIjoidCJ9&_nc_ohc=f5qxakeuuEwAX9cHG2u&_nc_ht=scontent-cpt1-1.xx&tp=14&oh=40a63116924ec5fface1d5a3206f9111&oe=60DABBF9"
        ],
        "images_lowquality_description": [
            "May be an image of text that says 'Nintendo E3 2021 Nintendo Direct E3 2021 Nintendo TREEHOUSE H\u2014 LIVE E3 2021 nintendo.com/ Livestreaming June 15 9:00am PST 12:00pm EST'"
        ],
        "video": null,
        "video_duration_seconds": null,
        "video_height": null,
        "video_id": null,
        "video_quality": null,
        "video_size_MB": null,
        "video_thumbnail": null,
        "video_watches": null,
        "video_width": null,
        "likes": 2600,
        "comments": 679,
        "shares": 475,
        "post_url": "https://facebook.com/Nintendo/posts/4225498680867886",
        "link": "http://ninten.do/6182nEhVO?fbclid=IwAR0LCJEaZweoETbVmUm_nQVrS750hahMbNwIplIkrDewExZZl4-KHle4bK0",
        "user_id": "119240841493711",
        "username": "Nintendo",
        "user_url": "https://facebook.com/Nintendo/?__tn__=C-R",
        "is_live": false,
        "factcheck": null,
        "shared_post_id": null,
        "shared_time": null,
        "shared_user_id": null,
        "shared_username": null,
        "shared_post_url": null,
        "available": true,
        "comments_full": null,
        "reactors": null,
        "w3_fb_url": null,
        "reactions": null,
        "reaction_count": null
    }
]

I suppose that works now - Thanks a lot!

bullride commented 3 years ago

Hi @neon-ninja

How should my script look if I only want to JSON DUMP the following data? I tried this.

    with open(filetosave, "w") as f:
        for post in get_posts(useracc, pages=2, cookies="cookies.json", options={"allow_extra_requests": False}):
            post = (post.get["post_id"], post.get["time"], post.get["post_text"], post.get["image"], post.get["images"], post.get["username"], post.get["post_url"])
            json.dump(post, f, indent=4, default=str)

Its giving me this error:


post = (post.get["post_id"],
TypeError: 'builtin_function_or_method' object is not subscriptable

And also only include the data where the value is not NULL or NONE Any Assistance will highly be appreciated.

neon-ninja commented 3 years ago

post.get is a function, so you should use round brackets () with it instead of square ones [] You can easily filter out keys that have a value of None, here's an one-line dict comprehension to do that

post = {k: v for k, v in post.items() if v is not None}