moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
60 stars 23 forks source link

Failing to get the time information from the time node #32

Closed shovon26 closed 3 months ago

shovon26 commented 5 months ago

I am not obtaining the time information from the time node when scraping data from a Facebook page. Here is a sample output JSON data for which the time retrieval has failed.

{
    "post_id": "936518901176820",
    "text": "We invite you to join us as #NASARemembers those who lost their lives in the pursuit of exploration. Our annual Day of Remembrance falls on Jan. 25 this year.\n\nLearn how we commit to safety while exploring the cosmos with our new NASA+ playlist: https://\ngo.nasa.gov/\n3Sdg7KB",
    "post_text": "We invite you to join us as #NASARemembers those who lost their lives in the pursuit of exploration. Our annual Day of Remembrance falls on Jan. 25 this year.\n\nLearn how we commit to safety while exploring the cosmos with our new NASA+ playlist: https://\ngo.nasa.gov/\n3Sdg7KB",
    "shared_text": "",
    "original_text": null,
    "time": null,
    "timestamp": null,
    //remaining node
}

Here is my python script which I am using for scraping post data.

cnt = 0
file_path = "temp_post.json"
all_posts = []
BASE_URL = "https://mbasic.facebook.com"
START_URL = "https://mbasic.facebook.com/NASA?v=timeline"
PAGE_URL = "NASA"

for cnt, post in enumerate(get_posts(PAGE_URL, base_url=BASE_URL, start_url=START_URL, pages=10, cookies="cookies.txt", options={"posts_per_page": 10, 'comments': True})):
    if cnt < 2:
        all_posts.append(post)
    else:
        break

with open(file_path, 'w') as file:
    json.dump(all_posts, file, indent=4)
    file.write('\n')

What is the problem? Is there any error here? In this scraping method, I am also not obtaining the number of followers and likes for the Facebook page. Is it possible to do so? @moda20

moda20 commented 3 months ago

@shovon26 the time node should be back to being accurate, except taht it's on the time attribute and not timestamp