Open msdeep14 opened 7 years ago
Actually, I'm not allowed to crawl Quora which is why this project shouldn't really be used.
I used a mix of selenium and bs4 for scraping Quora.
I got a idea, we can crawl if we login to website using our own login, I'm trying to do that, but not successfull, always being redirected to login page, whenever I try to access login page
data = {'email': email,'password':getpass()}
s.post('https://www.quora.com/{}/following'.format(username), data=data)
r = s.get('https://www.quora.com/{}/following'.format(username))
soup = BeautifulSoup(r.content)
print soup
div = soup.find('div',{'class':'UserConnectionsFollowingList PagedList'})
print div
In my script, I want to get the person I'm following and answers written by him
Do you have any ideas why I'm failing?
What is the response code you get on line 2?
s.post('https://www.quora.com/{}/following'.format(username), data=data)
<Response [200]>
Will this work for users that have logged in using facebook?
Quora doesn't really have an API. I doubt they'll provide data in this manner. I used selenium to get past this problem.
Till now what I want in my project is that, a person can get recent answers posted by people he is following, and I'm considering that he will login using email and password(will update code later for fb and google login)
Till now I wrote this
email = str(raw_input("email: "))
s = requests.Session()
data = {'email': email,'password':getpass()}
res = s.post('https://www.quora.com/{}/following'.format(username), data=data)
print res
r = s.get('https://www.quora.com/{}/following'.format(username))
soup = BeautifulSoup(r.content)
print soup
div = soup.find('div',{'class':'UserConnectionsFollowingList PagedList'})
print div
user_list = []
if div is not None:
for user in div.find_all('a'):
user_list.append("https://www.quora.com{}".format(url.get('href')))
print user_list
If I get user_list
, then I will fetch latest answers written by them.
As of robots.txt of quora, quora doesn't allow to crawl its website, https://www.quora.com/robots.txt I tried, sometimes it returns expected result, othertimes
None
.How did you do it? Did you sent email to robotstxt@quora.com