Closed fashan7 closed 3 years ago
What steps did you take to cause this problem? Were you scraping heavily during that hour? Perhaps at this sort of scale, you should keep records of which posts failed to extract reactions, and come back to backfill them later with get_posts(post_urls=[..])
, after whatever temporary block has worn off?
Basically when we pass cookie It actually logged in to FB and scrape the post So when the proxy is slow page loads for a while to give a complete rendered page While pages are rendering it will scrape the result before rendering and returns with a bad result This what I think basically @neon-ninja Am I correct?
Couldn't we wait until page/post loads 100% and then extract all data regarding that post?
That's not how it works - requests.get is a blocking operation - it doesn't return until the entire request is complete. It's all or nothing. If it's slow you might hit a timeout error though - requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='m.facebook.com', port=443): Read timed out. (read timeout=0.1)
If that's the case, why is not returning
commenter URL
"reactors":"None", "w3_fb_url":"None", "reactions":"None", "reaction_count":"None"
http://facebook.com/1610364829101773
Note: Btw I'm passing private proxy
@neon-ninja if u needed a private proxy, I can share with u with the cookies
Sure, post your proxy details. This works fine for me with my private proxy btw:
from facebook_scraper import *
import logging
enable_logging(logging.DEBUG)
set_proxy("squid.auckland.ac.nz:3128")
for post in get_posts(post_urls=[1610364829101773], cookies="cookies.txt", options={"reactions": True}):
print(post.get("reactions"))
output:
Proxy details: {'ip': '130.216.156.173', 'ip_decimal': 2195233965, 'country': 'New Zealand', 'country_iso': 'NZ', 'country_eu': False, 'latitude': -41, 'longitude': 174, 'time_zone': 'Pacific/Auckland', 'asn': 'AS9431', 'asn_org': 'The University of Auckland', 'hostname': 'squidproxy-f5vip.auckland.ac.nz', 'user_agent': {'product': 'Mozilla', 'version': '5.0', 'comment': '(Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36', 'raw_value': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36'}}
Requesting page from: https://m.facebook.com/1610364829101773
Fetching https://m.facebook.com/pgo.gov.ua/photos/pcb.137544304427389/137543791094107/?type=3&source=48&refid=52&__tn__=EHH-R
Fetching https://m.facebook.com/pgo.gov.ua/photos/pcb.137544304427389/137543827760770/?type=3&source=48&refid=52&__tn__=EHH-R
Fetching https://m.facebook.com/pgo.gov.ua/photos/pcb.137544304427389/137543917760761/?type=3&source=48&refid=52&__tn__=EHH-R
Fetching https://m.facebook.com/pgo.gov.ua/photos/pcb.137544304427389/137544181094068/?type=3&source=48&refid=52&__tn__=EHH-R
Fetching https://m.facebook.com/story.php?story_fbid=1610364829101773&id=365331280271807
[1610364829101773] Extract method extract_video_meta didn't return anything
[1610364829101773] Extract method extract_factcheck didn't return anything
1610364829101773 is a share of 137544304427389
data-ft attribute not found
{'like': 115, 'love': 4, 'haha': 13, 'wow': 7, 'care': 1, 'angry': 1}
@neon-ninja please check ur mail.
The proxy is fine, the problem is your cookies:
from facebook_scraper import _scraper
from facebook_scraper import *
for file in ["top.txt", "produc_cookies.json", "newjson.json", "cookies.txt", "cookies.json"]:
set_cookies(file)
print(file, _scraper.is_logged_in())
returns
top.txt False
produc_cookies.json True
newjson.json False
cookies.txt True
cookies.json True
IC, @neon-ninja But I directly exported Netscape cookies from this https://addons.mozilla.org/en-US/firefox/addon/cookie-quick-manager/
Did you logout of facebook in that session before or after exporting cookies?
Is there any way to export working cookies What I did was. Before login Into Facebook. I cleared history and then logged In and exported the cookies
The original cookies you sent me (with filename produc_cookies.json) are still valid, why not just use those?
from facebook_scraper import _scraper
from facebook_scraper import *
for file in ["top.txt", "produc_cookies.json", "newjson.json", "cookies.txt", "cookies.json"]:
set_cookies(file)
print(file, _scraper.is_logged_in())
this code is really helpfull
This commit (https://github.com/kevinzg/facebook-scraper/commit/9af15d86c26b76357e2f72198a955ac59631a558) will make it so that the scraper throws an exception if you pass invalid cookies
@neon-ninja how to make cookies, not to expire quickly
@neon-ninja can u share me the html, when logging using my cookies. is it possible to share with via email please
They don't expire quickly, the expiry is like 1 year away. produc_cookies.json is still valid, what did you do differently with those compared to say, top.txt?
@neon-ninja top.txt is an account that is 2FA authenticated account.
Maybe that's the problem?
if we figured it out means we are good. by getting the HTML response which is preventing from logging. is it possible to share me the file @neon-ninja
Why do you need me to extract html for you when you can just as easily do it yourself?
ok, can i know where is the place to put an debug print @neon-ninja
I'm not sure I understand - what do you want the HTML for? Of the 3 cookie files you've sent me, which are you referring to? Assuming you're referring to top.txt, it's just the standard facebook login page. Basically, you send the cookie to the facebook server, the server replies to tell your browser (or in this case, Python) to trash those cookies, as they're not valid, and sends you the login page HTML
I have removed the 2FA from this account, which I will send to u via mail. please check. @neon-ninja
What do you get when you run those cookies yourself?
`>>> for file in ["cookies3.txt"]:
... set_cookies(file)
... print(file, _scraper.is_logged_in())
...
cookies3.txt False`
@neon-ninja
Then that doesn't seem to have helped. Maybe Facebook have somehow flagged your account, such that any time you try connect from a new IP, you're forced to log in again
@neon-ninja Hi Is it possible, send a username and password which consists of 2FA A/C? when logged in Facebook will ask the code of the 2FA and passing the code obtained from authentication API/App and set when it required
Also note that you don't technically need facebook-scraper to observe this behaviour, you can just use curl like so:
curl --silent --head --cookie cookies3.txt https://facebook.com/settings|grep cookie
set-cookie: c_user=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=-1621542934; path=/; domain=.facebook.com
set-cookie: spin=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=-1621542934; path=/; domain=.facebook.com; httponly
set-cookie: xs=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=-1621542934; path=/; domain=.facebook.com; httponly
See how facebook says to delete those invalid cookies?
https://github.com/kevinzg/facebook-scraper/commit/18d9d539cb8fc95ac527027f81289071a0423b31 this commit should make it possible to enter your 2FA token on the command line
Hi @neon-ninja I passed Netscape cookies and ran the facebook-scraper using the console. It gave good JSON for this post
http://facebook.com/1610364829101773.
and later I again ran after an hour back. It didn't return reactionseven commenter URL is none
I feel something wrong when scraping.