How did you crawl quora ?

kalbhor / Unofficial-Quora-API

:cake: Unofficial Quora API (outdated..)

MIT License

20 stars 5 forks source link

How did you crawl quora ? #7

Open msdeep14 opened 7 years ago

msdeep14 commented 7 years ago

As of robots.txt of quora, quora doesn't allow to crawl its website, https://www.quora.com/robots.txt I tried, sometimes it returns expected result, othertimes None.

How did you do it? Did you sent email to robotstxt@quora.com

kalbhor commented 7 years ago

Actually, I'm not allowed to crawl Quora which is why this project shouldn't really be used.

I used a mix of selenium and bs4 for scraping Quora.

msdeep14 commented 7 years ago

I got a idea, we can crawl if we login to website using our own login, I'm trying to do that, but not successfull, always being redirected to login page, whenever I try to access login page

    data = {'email': email,'password':getpass()}
    s.post('https://www.quora.com/{}/following'.format(username), data=data)
    r = s.get('https://www.quora.com/{}/following'.format(username))
    soup = BeautifulSoup(r.content)
    print soup
    div = soup.find('div',{'class':'UserConnectionsFollowingList PagedList'})
    print div

In my script, I want to get the person I'm following and answers written by him

Do you have any ideas why I'm failing?

kalbhor commented 7 years ago

What is the response code you get on line 2? s.post('https://www.quora.com/{}/following'.format(username), data=data)

msdeep14 commented 7 years ago

<Response [200]>

kalbhor commented 7 years ago

Will this work for users that have logged in using facebook?

kalbhor commented 7 years ago

Quora doesn't really have an API. I doubt they'll provide data in this manner. I used selenium to get past this problem.

msdeep14 commented 7 years ago

Till now what I want in my project is that, a person can get recent answers posted by people he is following, and I'm considering that he will login using email and password(will update code later for fb and google login)

Till now I wrote this

    email = str(raw_input("email: "))
    s = requests.Session()
    data = {'email': email,'password':getpass()}
    res = s.post('https://www.quora.com/{}/following'.format(username), data=data)
    print res
    r = s.get('https://www.quora.com/{}/following'.format(username))
    soup = BeautifulSoup(r.content)
    print soup
    div = soup.find('div',{'class':'UserConnectionsFollowingList PagedList'})
    print div
    user_list = []
    if div is not None:
        for user in div.find_all('a'):
            user_list.append("https://www.quora.com{}".format(url.get('href')))
    print user_list

If I get user_list, then I will fetch latest answers written by them.