kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.4k stars 627 forks source link

Some posts return none_type object, even if the post has text #190

Closed cjdanbjorg closed 3 years ago

cjdanbjorg commented 3 years ago

I have been scraping some 180+ pages for months with facebook_scraper and having the occasional issue here and there - no problem.

Now I do however notice a significant incidents where it returns post_text = NoneType object, with no apparent reason.

I fail to see what would be causing this, I have attached two examples which are both lengthy and combines text with a picture, but I guess neither should be an issue.

Two example that return NoneType: https://www.facebook.com/AlexLiberalAlliance/posts/1645625405645276 https://facebook.com/AlexLiberalAlliance/posts/1627179514156532

In this same retrieval I collected other posts from the same page, without issues, one example is: https://facebook.com/AlexLiberalAlliance/posts/1636813539859796

My parameters are as follows:

for post in get_posts(targetusers[key],timeout=10,pages=3,extra_info=True,)

Any ideas?

lgjluis commented 3 years ago

Hi @cjdanbjorg,

Yes, Facebook uses a "Read more" when the text is very long. Had the same problem but solved with cookies. Read https://github.com/kevinzg/facebook-scraper/issues/28#issuecomment-793066983 so you can implement it.

neon-ninja commented 3 years ago

What version are you using? Both of your examples worked fine for me with 0.2.26:

urls = ["AlexLiberalAlliance/posts/1645625405645276", "AlexLiberalAlliance/posts/1627179514156532"]
posts = get_posts(post_urls=urls)
print([len(post["text"]) for post in posts])

outputs

[2285, 4496]
cjdanbjorg commented 3 years ago

What version are you using? Both of your examples worked fine for me with 0.2.26:

urls = ["AlexLiberalAlliance/posts/1645625405645276", "AlexLiberalAlliance/posts/1627179514156532"]
posts = get_posts(post_urls=urls)
print([len(post["text"]) for post in posts])

outputs

[2285, 4496]

I was on 0.2.24 (the default for pip install), so I upgraded to 0.2.26 and had no issue - exactly as you pointed out - thanks :-)

I've noted the possible issue with cookies and assume that using the cookieoption might also be a solution. However, that seems to demand a little tweeking on my site also, so I will continue with 0.2.26 for now.

Thanks again