moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
88 stars 28 forks source link

Get post problem with post['time'] is None and text is not full text #3

Closed igs-willyliao closed 11 months ago

igs-willyliao commented 11 months ago

Thanks for giving a solution for getting post that return empty []. I have run this https://github.com/kevinzg/facebook-scraper/issues/1070#issuecomment-1837606566 well. However, I often get some posts that 'time' key are None. It didn't seem to happen before using the old method.

Another problem, I can only get part of the information of the text but not the full text.

moda20 commented 11 months ago

@WillyLiaoIGS Yes i am aware of the post's text being truncated, as mbasic doesn't return the text in it's entirety unless you ask for the specific post which will increase the request count. an i am working on resolving this, but i think post time is returning fine, at least for page posts.

basheerpaliyathu commented 11 months ago

i get value forpost['time'] every time, but not getting value for post['timestamp'] i am scraping only pages.

moda20 commented 11 months ago

@basheerpaliyathu so they are both equal but one is an int but the other is a string ?

willlee88 commented 11 months ago

@moda20 I found that I can always get post['time'] from latest post. But I can't get the time of other posts every time. Sometimes I can get up to three posts, and sometimes only the latest post is available.

moda20 commented 11 months ago

@willlee88 i will check this today and see if i cna find the issue, but please do also check it out yourself it would be a great help too

moda20 commented 11 months ago

@willlee88 I can't find any issues with post['time'] really, if you can get me an example of your code or a post example where you don't ge the time

willlee88 commented 11 months ago

@moda20 Thanks for you help, I think I have found the problem. The default language set by my FB is Chinese and the format of its interface is different from the English interface.. I changed it to English and I can get the time correctly. This is my code for post in get_posts('Nintendo.hk', start_url="https://mbasic.facebook.com/Nintendo.hk?v=timeline", cookies=self.fb_cookie_path, pages=5): print(post)

moda20 commented 11 months ago

@willlee88 i added the implementation for the full text with an extra call to the post full page and getting the text from there, the fuill text will be in a full_text variable but only if the text is truncated. so be sure to check for the full text variable existence first.

I will close this issue with that.