moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
77 stars 28 forks source link

how can we create this ./mbasicHeaders.json file #22

Open Sarun1001 opened 9 months ago

Sarun1001 commented 9 months ago

can you pls provide a sample python script that work i tested using from facebook_scraper import get_posts, _scraper import json for post in get_posts('nintendo', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/nintendo?v=timeline", pages=3, cookies='trollmcookies.txt'): try: print(post['text'][:20]) print(post) except Exception as e: print(f"Error processing post: {e}")
this code and it doesn't return anything

moda20 commented 9 months ago

@Sarun1001 the mbasic headers file is a json file that you get from the developer tools. the steps are in the readME file but here are in more details :

  1. open an mbasic facebook page in your browser
  2. open the developer tools (options+command+i) and go to the network tab
  3. on the same dev tools panel open the responsive mode and choose samsung S20 ultra as your device
  4. Refresh the page you opened at first (command+r)
  5. filter by "Doc" the request list and right click on the first page loaded -> click copy -> as cURL
  6. got to https://curlconverter.com/ and paste, from there you can choose python as output and get the headers in a json format.
  7. copy the headers into your repo and into the file that you then inject into your scraper instance
Sarun1001 commented 9 months ago

Thanks for sharing the steps.

Successfully scraped posts, but after a few minutes i get temporary block

File "/home/ubuntu/.local/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py", line 944, in get raise exceptions.TemporarilyBanned(title.text) facebook_scraper.exceptions.TemporarilyBanned: You’re Temporarily Blocked

is there any suggestion to avoid this ban, also now what ? do i need a new ip or new fb account to continue using this, nb: before getting banned i run the script without a cookie file.

Jowawis99 commented 9 months ago

@Sarun1001 , Do you have an example of how your headers turned out? It doesn't work for me even though I created the json file correctly.

moda20 commented 9 months ago

@Sarun1001 I can't help you there really, just don't use it a lot in rapid succession.

Jowawis99 commented 9 months ago

@moda20 There are several headers that exist when copying like CURL, I don't know which headers you use, can you help me, please?

Sarun1001 commented 9 months ago

@Jowawis99 Reload the page at step 5 --> select 1st item in 'Name' column --> right click --> copy --> copy as curl --> paste on to curlconverter.com --> select json --> there you ca see an object called header

asheseux16 commented 7 months ago

I did everything above, but it returns nothing. My header looks like this:

{
'authority': 'mbasic.facebook.com',
 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
 'accept-language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
 'cache-control': 'max-age=0',
 'cookie': 'sb=rbAJZVCvEh8Q_09qlyWHTUMS; datr=rrAJZV9gFAn919FtNTDrVruN; c_user=100002607319714; ps_n=0; ps_l=0; dpr=1.5; wd=1280x559; presence=C%7B%22t3%22%3A%5B%5D%2C%22utc3%22%3A1709127707853%2C%22v%22%3A1%7D; xs=116%3ANBx7p68qbfFI0Q%3A2%3A1700836402%3A-1%3A11322%3A%3AAcXWxB1htPS7UjKQ6jWw7RJDJtOSzlOdKRXDk9Uy0tA; fr=1u3FkUasnpLa34xOQ.AWUs6AgQrE7dzqtOlIHC6gUhzlo.Bl30HX..AAA.0.0.Bl30HX.AWXvjSaIynQ; m_page_voice=100002607319714',
 'dpr': '1.5',
 'sec-ch-prefers-color-scheme': 'light',
 'sec-ch-ua': '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
 'sec-ch-ua-full-version-list': '"Not A(Brand";v="99.0.0.0", "Google Chrome";v="121.0.6167.185", "Chromium";v="121.0.6167.185"',
 'sec-ch-ua-mobile': '?1',
 'sec-ch-ua-model': '"SM-G981B"',
 'sec-ch-ua-platform': '"Android"',
 'sec-ch-ua-platform-version': '"13"',
 'sec-fetch-dest': 'document',
 'sec-fetch-mode': 'navigate',
 'sec-fetch-site': 'none',
 'sec-fetch-user': '?1',
 'upgrade-insecure-requests': '1',
 'user-agent': 'Mozilla/5.0 (Linux; Android 13; SM-G981B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Mobile Safari/537.36',
 'viewport-width': '210'
}

and the code is:

with open('./nintendo.json', 'r') as file:
    _scraper.mbasic_headers = json.load(file)

for post in get_posts('NintendoAmerica', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/NintendoAmerica?v=timeline", pages=10):
    print(post['text'][:50])

Could someone please tell me what's wrong?

moda20 commented 7 months ago

@asheseux16 The mbasic headers don't affect the response but rather the quality of the image. Please open another issue with your error. but for starter try to enable logging, to see the library response.

logging.setLevel(logging.DEBUG)
Drzhivago264 commented 5 months ago

I did everything above, but it returns nothing. My header looks like this:

{
'authority': 'mbasic.facebook.com',
 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
 'accept-language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
 'cache-control': 'max-age=0',
 'cookie': 'sb=rbAJZVCvEh8Q_09qlyWHTUMS; datr=rrAJZV9gFAn919FtNTDrVruN; c_user=100002607319714; ps_n=0; ps_l=0; dpr=1.5; wd=1280x559; presence=C%7B%22t3%22%3A%5B%5D%2C%22utc3%22%3A1709127707853%2C%22v%22%3A1%7D; xs=116%3ANBx7p68qbfFI0Q%3A2%3A1700836402%3A-1%3A11322%3A%3AAcXWxB1htPS7UjKQ6jWw7RJDJtOSzlOdKRXDk9Uy0tA; fr=1u3FkUasnpLa34xOQ.AWUs6AgQrE7dzqtOlIHC6gUhzlo.Bl30HX..AAA.0.0.Bl30HX.AWXvjSaIynQ; m_page_voice=100002607319714',
 'dpr': '1.5',
 'sec-ch-prefers-color-scheme': 'light',
 'sec-ch-ua': '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
 'sec-ch-ua-full-version-list': '"Not A(Brand";v="99.0.0.0", "Google Chrome";v="121.0.6167.185", "Chromium";v="121.0.6167.185"',
 'sec-ch-ua-mobile': '?1',
 'sec-ch-ua-model': '"SM-G981B"',
 'sec-ch-ua-platform': '"Android"',
 'sec-ch-ua-platform-version': '"13"',
 'sec-fetch-dest': 'document',
 'sec-fetch-mode': 'navigate',
 'sec-fetch-site': 'none',
 'sec-fetch-user': '?1',
 'upgrade-insecure-requests': '1',
 'user-agent': 'Mozilla/5.0 (Linux; Android 13; SM-G981B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Mobile Safari/537.36',
 'viewport-width': '210'
}

and the code is:

with open('./nintendo.json', 'r') as file:
    _scraper.mbasic_headers = json.load(file)

for post in get_posts('NintendoAmerica', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/NintendoAmerica?v=timeline", pages=10):
    print(post['text'][:50])

Could someone please tell me what's wrong?

It is a very bad idea to post cookies on the Internet, you should change your Facebook password now